[ic] Search Term Mapping

Mike Heins mikeh@minivend.com
Wed, 2 May 2001 14:40:36 -0400


Quoting Christopher VanOosterhout (chris@vanoosterhout.com):
> 
> Today I was looking for information about search term mapping in the 
> archives and found that the software used to search the archives does just 
> what I am hoping I can do in my Interchange stores.
> 
> When I search for     search   term   mapping ... it actually searches for: 
> Search results for '(search or searched or searching or searchings or 
> searcher or searches or searchers) and (term or termed or terming or termer 
> or terms or termly) and (mapping or mappings)'
> 
> To me that is a good thing.   I am not sure how this software does it (if 
> it is manual mapping or if it just automatically enters potential endings 
> to words to increase the search).
> 
> After searching I have not found the answer about how this type of thing 
> may be implemented in Interchange.  If I have just missed it, please let me 
> know and I will go back and review again.
> 
> Has anyone here successfully implemented search mapping within an 
> Interchange store?  How did you do it?
> 
> What I am trying to do is say if someone searches for WORDA, by default, 
> the search would also return positive results for WORDB.  So if someone 
> searches for the word book as in PARTS BOOK it will also search for the 
> words catalog and manual and any other word I map to the word BOOK.
> 
> This means that if someone searches for PARTS BOOK, they will also get 
> matches to PARTS CATALOG, PARTS MANUAL, etc.
> 
> Or if someone searches for the word oil filters they will also get oil filter.
> 
> Does that may sense?  Any hints on how to accomplish it?
> 

This is called stemming. Interchange doesn't support it -- the software that
you see that does is htdig.

On the other hand, I have done some things similar with interchange. What
you do is:

	1. Inside the [no-match] [/no-match] area, you pull apart the
	mv_searchspec value.

	2. Examine your stemming list and look for pluralization or 
	other stems. The Lingua::Stem module Perl module provides some
	methods, i.e.:

	   @words = 'filters';
	   use Lingua::Stem qw(stem);
	   my $stems   = stem(@words);
	   print join "\n", @$stems;

	This could easily be put into an Interchange global UserTag:

	    UserTag stem Order words
	    UserTag stem addAttr
	    UserTag stem Routine <<EOR
	    sub {
		my ($words) = @_;
		use Lingua::Stem qw/stem/;
		my @words = grep /\S/, split /\s+/, $words;
		my @stems;
		for(@words) {
		    my $stem = stem($_);
		    next if $stem eq $_;
		    push @stems, $stem;
		}
		return join " ", @stems;
	    }
	    EOR
	
	At that point, you can call it in embedded Perl with:

	[no-match]
	[perl]
	    return unless ref $Values->{mv_searchspec};
	    my @alt;
	    for(@{$Values->{mv_searchspec}}) {
		my $stem = $Tag->stem($_);
		push @alt, $stem if $stem;
	    }
	    ## Rest of code which builds search links left as exercise
            ## for the user....
	[/perl]
	[/no-match]

	3a. Provide some clickable links that allow people to search
	for the alternate stems....
	
		--or--

	3b. Do the alternate stem searches automatically via a [bounce ....].


-- 
Red Hat, Inc., 131 Willow Lane, Floor 2, Oxford, OH  45056
phone +1.513.523.7621 fax 7501 <mheins@redhat.com>

People who want to share their religious views with you
almost never want you to share yours with them. -- Dave Barry