[ic] Search content of a PDF file

Jon Jensen jon at endpoint.com
Sun May 28 20:26:04 EDT 2006


On Sun, 28 May 2006, maillists wrote:

> I have a client who has about 150 pdf files, and each file contains
> about 20 - 50 pages of text!
>
> They want to have a search engine that can search the content of these
> files, and display at least a link to the pdf files for download.
>
> Is there a good way for IC to do this kind of search, or should I
> integrate some other system for this into the IC site.
>
> My first thought was to copy/paste all the content of each file into a
> database. But I was wondering if there was a way to search inside the
> content of the pdf file itself. Speed is not too much an issue.
>
> Looking for guidance...

Rick,

Swish-e is a good package for doing searches like this. It can index text 
inside PDF files:

http://swish-e.org/

Interchange has a Swish-e search interface. Search for "swish" on 
icdevgroup.org and you'll find mention of how to set it up on the mailing 
lists.

Jon

-- 
Jon Jensen
End Point Corporation
http://www.endpoint.com/


More information about the interchange-users mailing list