Isearch is a text search system developed by CNIDR. CNIDR is the Center for Networked Information Discovery and Retrieval. They are part of a non-profit corporation known as MCNC. MCNC used to stand for "Microelectronics Center of North Carolina", but they dropped the long form and now they're legally just MCNC. Anyway, the lawyers thought you'd like to know that.
Isearch pretty much does one thing: it lets the user search through a bunch of text by hunting for words. It doesn't "understand" your text collection, it doesn't try to parse it, it just uses slightly-less-than-brute-force statistical methods to hunt for words in documents. It does this by building indexes. An Isearch index is a just like the index in a book: if you look for a word in the index of a book, it tells you a page number to look at. Likewise, if you search for a word with Isearch, it looks in the index for a filename associated with that word and then shows that file to you. It can do more complicated things, but that's the basic game plan.
Installing Isearch requires five steps:
Really, installation usually isn't tricky at all. Start to finish should take around thirty minutes the first time, and about ten when you download new versions.
Okay, we lied: There's an extra step. You really should have gcc and g++ on your machine, and you must have gzip (and its companion gunzip) so you can unpack the archive that Isearch comes in. You can get gzip and gcc via anonymous ftp from prep.ai.mit.edu in the directory /pub/gnu. Gzip installation is fairly simple. Unpack the tar file and follow the instructions in the file "README". Installing gcc and g++ is a lot more complicated. In principle, you could use a vendor-supplied C++ compiler. The Solaris SparcWorks C++ compiler is known to work, and the CenterLine C++ compiler has also compiled Isearch successfully. The bad news is that there are several subtly different versions of the C++ specification and each compiler views it a little differently. Getting Isearch to work with a different compiler is about like porting Isearch to another machine. Isearch is also fairly dependent on compiler versions, especially with gcc. Gcc users should make sure they are using at least 2.7.X. Note that the old, pre-compiled gcc for Solaris is not meant to be a general-purpose compiler: it's just enough of a compiler to compile a newer version of gcc.
Incidentally: You're also going to have to install libg++ if you haven't already. SAme ftp site, same directory. Install it after you install g++.
You should always work from the latest version of Isearch available. Bugs are being corrected and new features are being added daily (literally). You can always get the newest version from ftp.cnidr.org. Here's a sample session where we will download today's version of Isearch from ftp.cnidr.org by logging in as "anonymous", sending our email address as our password, changing the current directory to "/pub/NIDR.tools/Isearch", making sure we're in binary mode for ftp, and getting the file:
sti-gw% ftp ftp.cnidr.org Connected to kudzu.cnidr.org. 220 kudzu FTP server (Version wu-2.4(1) Sun Jan 1 17:43:49 EST 1995) ready. Name (ftp.cnidr.org:escott): anonymous 331 Guest login ok, send your complete e-mail address as password. Password: 230- Welcome to kudzu.cnidr.org 230- 230- ftp logged in from sti-gw.sti-ext.com at Mon Mar 25 11:14:59 1996 230- 230- There are currently 7 users of the maximum 25 logged on. 230- 230- If you have trouble using this experimental FTP client, try including 230- a dash in front of your username (as your login password). This will 230- disable informational messages such as these. 230- 230- Comments??? Send mail to `bmv@k12.cnidr.org` 230- 230- 230 Guest login ok, access restrictions apply. ftp> cd /pub/NIDR.tools/Isearch 250-Please read the file README 250- it was last modified on Fri Mar 22 15:41:06 1996 - 3 days ago 250 CWD command successful. ftp> dir 200 PORT command successful. 150 Opening ASCII mode data connection for /bin/ls. total 4995 -rw-rw-r-- 1 138 30 1026816 Mar 22 20:40 Isearch-1.13-aix.tar.gz -rw-rw-r-- 1 138 30 480568 Mar 22 20:40 Isearch-1.13-linux.tar.gz -rw-rw-r-- 1 138 30 1176479 Mar 22 20:40 Isearch-1.13-osf1.tar.gz -rw-rw-r-- 1 138 30 667405 Mar 22 20:40 Isearch-1.13-solaris.tar.gz -rw-rw-r-- 1 138 30 741496 Mar 22 20:40 Isearch-1.13-sunos.tar.gz -rw-rw-r-- 1 138 30 808411 Mar 22 20:41 Isearch-1.13-ultrix.tar.gz -rw-rw-r-- 1 138 30 125536 Mar 22 20:41 Isearch-1.13.tar.gz -rw-rw-r-- 1 138 30 970 Mar 22 20:41 README drwxrwxr-x 11 138 30 512 Mar 22 20:16 archive drwxrwxr-x 2 138 30 1024 Mar 22 20:43 untested 226 Transfer complete. 772 bytes received in 0.031 seconds (24 Kbytes/s) ftp> binary 200 Type set to I. ftp> get Isearch-1.13.tar.gz 200 PORT command successful. 150 Opening BINARY mode data connection for Isearch-1.13.tar.gz (125536 bytes). 226 Transfer complete. local: Isearch-1.13.tar.gz remote: Isearch-1.13.tar.gz 125536 bytes received in 0.12 seconds (9.9e+02 Kbytes/s) ftp> quit sti-gw%
Notice a few things:
You should now have a copy of Isearch as a gzipped tar tar. The ".gz" suffix indicates a gzipped file, and the ".tar" suffix indicates it's tarred. The first thing to do is to uncompress the file:
sti-gw% gunzip Isearch-1.13.tar.gz
You now have a (much larger) file called "Isearch-1.13.tar".
The next step is to un-tar the file we just uncompressed. In our example, we're going to want to install Isearch so it has the path name "/local/project/Isearch-1.13". To do this, copy or move the file to "/local/project":
sti-gw% mv Isearch-1.13 /local/project
Finally, we'll un-tar the distribution:
sti-gw$ cd /local/project
sti-gw% tar xf Isearch-1.13.tar
There should now be a directory named /local/project/Isearch-1.13, and it should contain the Isearch distribution:
sti-gw% ls /local/project/Isearch-1.13 CHANGES Makefile TUTORIAL doctype COPYRIGHT README bin src
If you see all of that, then you probably got it right.
This is the part that makes people nervous, but it really isn't that bad at all. You need to edit the Makefile with your favorite editor, and essentially just follow the instructions. Edit /local/project/Isearch-1.13/Makefile in this example (you won't have to edit the makefiles in the subdirectories, they automatically inherit what they need).
The first choice is for compiler. Probably 99% of the world should leave the default "g++". Isearch is developed by people who use g++, so that's your best bet. It's also hard to beat the price.
The second choice is for "CFLAGS", which are the options to pass to the compiler. There are canned selections for you to choose from. If you don't know what to select, then "CFLAGS=-g -DUNIX -Wall" is a good starting guess.
The third choice is for the location to install the finished programs. The default "/usr/local/bin" is a good guess. Note that if you don't elect to actually "make install" in step 4 then you'll never have to set this.
The rest of the choices probably should be left alone.
While on the subject of editing files, it's worth noting that the file /local/project/Isearch-1.13/doctype/dtconf.inf describe how many "doctypes" your copy of Isearch will be aware of. A doctype is essentially a file type handler; there are doctypes for simple ASCII files, files of separate paragraphs, and so forth. You can almost always just leave this file alone and take the default, which is "Give me all of 'em!".
At this point, you should be able to
sti-gw% cd /local/project/Isearch-1.13 sti-gw% make
And all the right things will happen. Note that many compilers will print a few warning lines along the way, but they're warning about pretty harmless stuff. If there are any errors, then the compilation will grind to a halt. Otherwise, when you're done there will be new files in the "bin" subdirectory:
sti-gw% ls bin Iindex Isearch Iutil libIsearch.a
If you got that far, then you can (optionally) :
sti-gw% make install
and the compiled executables will be copied to /usr/local/bin.
Now let's index a little text and see if Isearch really works. First, let's pick a couple of files to index. In this example, we'll index the files "CHANGES" and "COPYRIGHT" since we know everyone has them.
sti-gw% cd /local/project/Isearch-1.13 sti-gw% mkdir testIndex sti-gw% cd testIndex sti-gw% /local/project/Isearch-1.13/bin/Iindex -d tester ../CHANGES ../COPYRIGHT Iindex 1.13 Building document list ... Building database tester: Parsing files ... Parsing /local/project/Isearch-1.13/CHANGES ... Parsing /local/project/Isearch-1.13/COPYRIGHT ... Indexing 1870 words ... Merging index ... Database files saved to disk. sti-gw% ls tester.dbi tester.inx tester.mdg tester.mdk tester.mdt
That created five index files to describe the two files we indexed. Don't worry, we could have indexed every file on your system and still only had five index files. Now let's do a little searching and see how well we did:
sti-gw% /local/project/Isearch-1.13/bin/Isearch -d tester RSET table ../bin/Isearch -d tester RSET table Isearch 1.13 Searching database tester: 1 document(s) matched your query, 1 document(s) displayed. Score File 1. 100 /local/project/Isearch-1.13/CHANGES Select file #:
At this point, if you enter "1" and press return, it will display the contents of "CHANGES". If you press return without a number, Isearch will exit. This is sort of a trivial example, but if you index thousands of files then the above search will list the ones that contain either "RSET" or "table". If you run Isearch without any arguments then it will give a list of all of its options, what they mean, and some examples of their use. For more information, see the Isearch Tutorial included with the distribution.