<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Binary Expressions &#187; Perl</title>
	<atom:link href="http://www.adamsdesk.com/be/category/development/perl-development/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.adamsdesk.com/be</link>
	<description>The life experiences of Adam Douglas. Technical help too!</description>
	<lastBuildDate>Tue, 27 Jul 2010 19:45:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How To &#8211; Installing Swish-e v2.4.5</title>
		<link>http://www.adamsdesk.com/be/archives/2008/05/30/how-to-installing-swish-e-v245/</link>
		<comments>http://www.adamsdesk.com/be/archives/2008/05/30/how-to-installing-swish-e-v245/#comments</comments>
		<pubDate>Fri, 30 May 2008 12:58:28 +0000</pubDate>
		<dc:creator>Adam</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[engine]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[spider]]></category>
		<category><![CDATA[swish]]></category>
		<category><![CDATA[swish-e]]></category>
		<category><![CDATA[swishe]]></category>

		<guid isPermaLink="false">http://www.adamsdesk.com/be/archives/2008/05/30/how-to-installing-swish-e-v245/</guid>
		<description><![CDATA[Background Knowledge

The installation of Swish-e v2.4.5 is being installed on  a system with OpenBSD v3.7 i386 and Perl v5.8.6 using BASH as the command prompt interrupter.
What is Swish-e?

As quotes by Swish-e.org&#8230;
&#8220;Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is ideally suited for [...]]]></description>
			<content:encoded><![CDATA[<h4>Background Knowledge</h4>
<hr />
<p>The installation of Swish-e v2.4.5 is being installed on  a system with OpenBSD v3.7 i386 and Perl v5.8.6 using BASH as the command prompt interrupter.</p>
<h4>What is Swish-e?</h4>
<hr />
<p>As quotes by Swish-e.org&#8230;</p>
<p>&#8220;Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is ideally suited for collections of a million documents or smaller. Using the GNOME&trade; libxml2 parser and a collection of filters, Swish-e can index plain text, e-mail, PDF, HTML, XML, Microsoft&reg; Word/PowerPoint/Excel and just about any file that can be converted to XML or HTML text. Swish-e is also often used to supplement databases like the MySQL&reg; DBMS for very fast full-text searching. Check out the <a href="http://swish-e.org/docs/readme.html#key_features">full list of features</a>.&#8221;</p>
<p><span id="more-377"></span></p>
<h4>Installation Prerequisites</h4>
<hr />
<!--more--></p>
<p>I already have the below packages installed, as most systems do. If yours does not refer to your operating system package system for the easiest and fastest installation.</p>
<ul>
<li><a href="http://xmlsoft.org/index.html">Libxml2</a></li>
<li><a href="http://zlib.net/">Zlib Compression</a></li>
</ul>
<p>Perl Modules &#8211; Required</p>
<ul>
<li><a href="http://search.cpan.org/~gaas/libwww-perl-5.808/lib/LWP.pm">LWP</a></li>
<li><a href="http://search.cpan.org/~gaas/URI-1.35/URI.pm">URI</a></li>
<li><a href="http://search.cpan.org/~gaas/HTML-Parser-3.56/Parser.pm">HTML::Parser</a></li>
<li><a href="http://search.cpan.org/~petdance/HTML-Tagset-3.20/Tagset.pm">HTML::Tagset</a></li>
<li><a href="http://search.cpan.org/~markov/MIME-Types-1.23/lib/MIME/Types.pod">MIME::Types</a> (optional)</li>
<li><a href="http://search.cpan.org/~hank/SWISH-Stemmer-0.03/Stemmer.pm">SWISH::Stemmer</a> (optional)</li>
</ul>
<p>If you receive a &#8220;Out of Memory!&#8221; error message when installing the Perl modules, refer to my blog post <a href="http://www.adamsdesk.com/be/archives/2007/12/19/perl-out-of-memory/">Perl &#8211; Out of Memory!</a> for the solution.</p>
<p>I was able to do install all modules by doing the below example. I answered &#8220;yes&#8221; to each question I was prompted for. To list which Perl modules you have installed already type at the prompt without double quotes, &#8220;instmodsh&#8221;.</p>
<p><strong>Note:</strong> The prompt character is show as &#8220;$&#8221; which indicates action is run as an unprivileged user and &#8220;#&#8221; indicates action is run as the superuser (root) using the <a href="http://en.wikipedia.org/wiki/Sudo">sudo</a> command.</p>
<ul>
<li># sudo cpan -i LWP</li>
<li># sudo cpan -i MIME::Types</li>
<li>Optional</li>
<li>$ tar -zxvf SWISH-Stemmer-0.05.tar.gz</li>
<li>$ cd SWISH-Stemmer-0.05</li>
<li>$ perl Makefile.PL</li>
<li>$ make</li>
<li>$ make test</li>
<li># sudo make install</li>
</ul>
<h4>Indexing Feature Requirements (optional)</h4>
<hr />
<p>To index specific content you may require installing additional packages, refer to the list below to see what you may require. All packages listed maybe installed after the installation of Swish-e. I have not ventured into installing and setting up these packages below. There maybe better solutions to achieve the ability to index such content that I am not aware of at this time.</p>
<ul>
<li>PDF documents &#8211; <a href="http://www.foolabs.com/xpdf/">Xpdf</a> or <a href="http://pdftohtml.sourceforge.net/">PDF2HTML</a></li>
<li>MS Word documents &#8211; <a href="http://site.n.ml.org/info/catdoc/">Catdoc</a></li>
<li>MP3 ID3 Tags &#8211; Perl module <a href="http://search.cpan.org/~ilyaz/MP3-Tag-0.9710/Tag.pm">MP3::Tag</a></li>
<li>MS Excel documents &#8211; Perl module <a href="http://search.cpan.org/~szabgab/Spreadsheet-ParseExcel-0.32/lib/Spreadsheet/ParseExcel.pm">Spreadsheet::ParseExcel</a> and <a href="http://search.cpan.org/~gaas/HTML-Parser-3.56/lib/HTML/Entities.pm">HTML::Entities</a></li>
</ul>
<h4>Installing Swish-e</h4>
<hr />
<p><strong>Note:</strong> The prompt character is show as &#8220;$&#8221; which indicates action is run as an unprivileged user and &#8220;#&#8221; indicates action is run as the superuser (root) using the <a href="http://en.wikipedia.org/wiki/Sudo">sudo</a> command.</p>
<ol>
<li>$ wget http://swish-e.org/Download/swish-e-2.4.0.tar.gz)</li>
<li>$ tar -zxvf swish-e-2.4.5.tar.gz</li>
<li>$ cd swish-e-2.4.5</li>
<li>$ ./configure</li>
<li>$ make</li>
<li>$ make check</li>
<li># sudo make install</li>
</ol>
<h4>Setting Up Swish-e</h4>
<hr />
<p>Configuration and use of Swish-e can be done in many different ways. In this particular situation I am spidering using &#8220;prog&#8221; on a web site that has a public and private side to it. Therefore I have to separate configuration, indexes and have two search engine interfaces. Here&#8217;s a break down of my configuration files and template for the public side.</p>
<dl>
<dt>.public.swishcgi.conf</dt>
<dd>Settings for swish.cgi (search engine web interface).</dd>
<dt>swish.cgi</dt>
<dd>CGI Perl script used for searching with the Swish-e search engine. The only line one may modify in this file is the $DEFAULT_CONFIG_FILE value other than that the rest should be left alone.</dd>
<dt>swish-e.public.conf</dt>
<dd>Swish-e program configuration settings.</dd>
<dt>SwishSpiderConfig.public.pl</dt>
<dd>Spider.pl settings.</dd>
<dt>DefaultTemplate.pm</dt>
<dd>The output template for swish.cgi. For further details refer to <a href="http://swish-e.org/docs/swish.cgi.html">swish.cgi documentation</a>.</dd>
</dl>
<p>Examples of the configuration files are located at /usr/local/share/doc/swish-e/examples/ and an example of swish.cgi is located at /usr/local/lib/swish-e/swish.cgi. The swish.cgi Perl script can be stored at any location that is preferred.</p>
<h4>Apache HTTPD &#8211; Setting Up CGI</h4>
<hr />
<p>To ensure your web server is configured properly make sure you have the following directives set in your configuration file.</p>
<ul>
<li>ScriptAlias /search/ &#8220;/var/www/cgi-bin/&#8221;</li>
<li>&lt;directory &#8220;/var/www/cgi-bin&#8221;&gt;<br />
    AllowOverride None<br />
    Options +ExecCGI<br />
    Order allow,deny<br />
    Allow from all<br />
&lt;/directory&gt;</li>
<li>AddHandler cgi-script .cgi</li>
</ul>
<h4>Indexing</h4>
<hr />
<p>Indexed public web site with the following command.<br />
swish-e -S prog -c swishe.venmarces.public.conf</p>
<p>With a specific spider config file.<br />
/usr/local/lib/swish-e/spider.pl SwishSpiderConfig.venmarces.public.pl | swish-e -S prog -c swishe.venmarces.public.conf -i stdin</p>
<ul style="list-style-type: none; padding: 0;">
<li>Source: <a href="http://swish-e.org/">Swish-e</a></li>
<li>Source: <a href="http://swish-e.org/download/">Swish-e Downloads</a></li>
<li>Source: <a href="http://swish-e.org/docs/install.html">Swish-e Installation Instructions</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.adamsdesk.com/be/archives/2008/05/30/how-to-installing-swish-e-v245/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
