How To - Installing Swish-e v2.4.5
Background Knowledge
The installation of Swish-e v2.4.5 is being installed on a system with OpenBSD v3.7 i386 and Perl v5.8.6 using BASH as the command prompt interrupter.
What is Swish-e?
As quotes by Swish-e.org…
“Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is ideally suited for collections of a million documents or smaller. Using the GNOME™ libxml2 parser and a collection of filters, Swish-e can index plain text, e-mail, PDF, HTML, XML, Microsoft® Word/PowerPoint/Excel and just about any file that can be converted to XML or HTML text. Swish-e is also often used to supplement databases like the MySQL® DBMS for very fast full-text searching. Check out the full list of features.”
Installation Prerequisites
I already have the below packages installed, as most systems do. If yours does not refer to your operating system package system for the easiest and fastest installation.
Perl Modules - Required
- LWP
- URI
- HTML::Parser
- HTML::Tagset
- MIME::Types (optional)
- SWISH::Stemmer (optional)
If you receive a “Out of Memory!” error message when installing the Perl modules, refer to my blog post Perl - Out of Memory! for the solution.
I was able to do install all modules by doing the below example. I answered “yes” to each question I was prompted for. To list which Perl modules you have installed already type at the prompt without double quotes, “instmodsh”.
Note: The prompt character is show as “$” which indicates action is run as an unprivileged user and “#” indicates action is run as the superuser (root) using the sudo command.
- # sudo cpan -i LWP
- # sudo cpan -i MIME::Types
- Optional
- $ tar -zxvf SWISH-Stemmer-0.05.tar.gz
- $ cd SWISH-Stemmer-0.05
- $ perl Makefile.PL
- $ make
- $ make test
- # sudo make install
Indexing Feature Requirements (optional)
To index specific content you may require installing additional packages, refer to the list below to see what you may require. All packages listed maybe installed after the installation of Swish-e. I have not ventured into installing and setting up these packages below. There maybe better solutions to achieve the ability to index such content that I am not aware of at this time.
- PDF documents - Xpdf or PDF2HTML
- MS Word documents - Catdoc
- MP3 ID3 Tags - Perl module MP3::Tag
- MS Excel documents - Perl module Spreadsheet::ParseExcel and HTML::Entities
Installing Swish-e
Note: The prompt character is show as “$” which indicates action is run as an unprivileged user and “#” indicates action is run as the superuser (root) using the sudo command.
- $ wget http://swish-e.org/Download/swish-e-2.4.0.tar.gz)
- $ tar -zxvf swish-e-2.4.5.tar.gz
- $ cd swish-e-2.4.5
- $ ./configure
- $ make
- $ make check
- # sudo make install
Setting Up Swish-e
Configuration and use of Swish-e can be done in many different ways. In this particular situation I am spidering using “prog” on a web site that has a public and private side to it. Therefore I have to separate configuration, indexes and have two search engine interfaces. Here’s a break down of my configuration files and template for the public side.
- .public.swishcgi.conf
- Settings for swish.cgi (search engine web interface).
- swish.cgi
- CGI Perl script used for searching with the Swish-e search engine. The only line one may modify in this file is the $DEFAULT_CONFIG_FILE value other than that the rest should be left alone.
- swish-e.public.conf
- Swish-e program configuration settings.
- SwishSpiderConfig.public.pl
- Spider.pl settings.
- DefaultTemplate.pm
- The output template for swish.cgi. For further details refer to swish.cgi documentation.
Examples of the configuration files are located at /usr/local/share/doc/swish-e/examples/ and an example of swish.cgi is located at /usr/local/lib/swish-e/swish.cgi. The swish.cgi Perl script can be stored at any location that is preferred.
Apache HTTPD - Setting Up CGI
To ensure your web server is configured properly make sure you have the following directives set in your configuration file.
- ScriptAlias /search/ “/var/www/cgi-bin/”
- <directory “/var/www/cgi-bin”>
AllowOverride None
Options +ExecCGI
Order allow,deny
Allow from all
</directory> - AddHandler cgi-script .cgi
Indexing
Indexed public web site with the following command.
swish-e -S prog -c swishe.venmarces.public.conf
With a specific spider config file.
/usr/local/lib/swish-e/spider.pl SwishSpiderConfig.venmarces.public.pl | swish-e -S prog -c swishe.venmarces.public.conf -i stdin
- Source: Swish-e
- Source: Swish-e Downloads
- Source: Swish-e Installation Instructions

