Binary Expressions

2008-5-30

How To - Installing Swish-e v2.4.5

Filed under: — The Warden @ 6:58 am

Background Knowledge


The installation of Swish-e v2.4.5 is being installed on a system with OpenBSD v3.7 i386 and Perl v5.8.6 using BASH as the command prompt interrupter.

What is Swish-e?


As quotes by Swish-e.org…

“Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is ideally suited for collections of a million documents or smaller. Using the GNOME™ libxml2 parser and a collection of filters, Swish-e can index plain text, e-mail, PDF, HTML, XML, Microsoft® Word/PowerPoint/Excel and just about any file that can be converted to XML or HTML text. Swish-e is also often used to supplement databases like the MySQL® DBMS for very fast full-text searching. Check out the full list of features.”

Installation Prerequisites


I already have the below packages installed, as most systems do. If yours does not refer to your operating system package system for the easiest and fastest installation.

Perl Modules - Required

If you receive a “Out of Memory!” error message when installing the Perl modules, refer to my blog post Perl - Out of Memory! for the solution.

I was able to do install all modules by doing the below example. I answered “yes” to each question I was prompted for. To list which Perl modules you have installed already type at the prompt without double quotes, “instmodsh”.

Note: The prompt character is show as “$” which indicates action is run as an unprivileged user and “#” indicates action is run as the superuser (root) using the sudo command.

  • # sudo cpan -i LWP
  • # sudo cpan -i MIME::Types
  • Optional
  • $ tar -zxvf SWISH-Stemmer-0.05.tar.gz
  • $ cd SWISH-Stemmer-0.05
  • $ perl Makefile.PL
  • $ make
  • $ make test
  • # sudo make install

Indexing Feature Requirements (optional)


To index specific content you may require installing additional packages, refer to the list below to see what you may require. All packages listed maybe installed after the installation of Swish-e. I have not ventured into installing and setting up these packages below. There maybe better solutions to achieve the ability to index such content that I am not aware of at this time.

Installing Swish-e


Note: The prompt character is show as “$” which indicates action is run as an unprivileged user and “#” indicates action is run as the superuser (root) using the sudo command.

  1. $ wget http://swish-e.org/Download/swish-e-2.4.0.tar.gz)
  2. $ tar -zxvf swish-e-2.4.5.tar.gz
  3. $ cd swish-e-2.4.5
  4. $ ./configure
  5. $ make
  6. $ make check
  7. # sudo make install

Setting Up Swish-e


Configuration and use of Swish-e can be done in many different ways. In this particular situation I am spidering using “prog” on a web site that has a public and private side to it. Therefore I have to separate configuration, indexes and have two search engine interfaces. Here’s a break down of my configuration files and template for the public side.

.public.swishcgi.conf
Settings for swish.cgi (search engine web interface).
swish.cgi
CGI Perl script used for searching with the Swish-e search engine. The only line one may modify in this file is the $DEFAULT_CONFIG_FILE value other than that the rest should be left alone.
swish-e.public.conf
Swish-e program configuration settings.
SwishSpiderConfig.public.pl
Spider.pl settings.
DefaultTemplate.pm
The output template for swish.cgi. For further details refer to swish.cgi documentation.

Examples of the configuration files are located at /usr/local/share/doc/swish-e/examples/ and an example of swish.cgi is located at /usr/local/lib/swish-e/swish.cgi. The swish.cgi Perl script can be stored at any location that is preferred.

Apache HTTPD - Setting Up CGI


To ensure your web server is configured properly make sure you have the following directives set in your configuration file.

  • ScriptAlias /search/ “/var/www/cgi-bin/”
  • <directory “/var/www/cgi-bin”>
    AllowOverride None
    Options +ExecCGI
    Order allow,deny
    Allow from all
    </directory>
  • AddHandler cgi-script .cgi

Indexing


Indexed public web site with the following command.
swish-e -S prog -c swishe.venmarces.public.conf

With a specific spider config file.
/usr/local/lib/swish-e/spider.pl SwishSpiderConfig.venmarces.public.pl | swish-e -S prog -c swishe.venmarces.public.conf -i stdin

Leave a Reply

Take back your mailbox - CAUCE.org

Powered By Wordpress PHP: Hypertext Preprocessor MySQL Powered Download Juice, the cross-platform podcast receiver
Proud To Be Canadian Get Firefox Valid XHTML Valid CSS
<NO>OOXML Logo


24 queries. 0.312 seconds.
Copyright © 2004 - 2005 by Adam Douglas