[ic] Search Term Mapping
Mike Heins
mikeh@minivend.com
Wed, 2 May 2001 14:40:36 -0400
Quoting Christopher VanOosterhout (chris@vanoosterhout.com):
>
> Today I was looking for information about search term mapping in the
> archives and found that the software used to search the archives does just
> what I am hoping I can do in my Interchange stores.
>
> When I search for search term mapping ... it actually searches for:
> Search results for '(search or searched or searching or searchings or
> searcher or searches or searchers) and (term or termed or terming or termer
> or terms or termly) and (mapping or mappings)'
>
> To me that is a good thing. I am not sure how this software does it (if
> it is manual mapping or if it just automatically enters potential endings
> to words to increase the search).
>
> After searching I have not found the answer about how this type of thing
> may be implemented in Interchange. If I have just missed it, please let me
> know and I will go back and review again.
>
> Has anyone here successfully implemented search mapping within an
> Interchange store? How did you do it?
>
> What I am trying to do is say if someone searches for WORDA, by default,
> the search would also return positive results for WORDB. So if someone
> searches for the word book as in PARTS BOOK it will also search for the
> words catalog and manual and any other word I map to the word BOOK.
>
> This means that if someone searches for PARTS BOOK, they will also get
> matches to PARTS CATALOG, PARTS MANUAL, etc.
>
> Or if someone searches for the word oil filters they will also get oil filter.
>
> Does that may sense? Any hints on how to accomplish it?
>
This is called stemming. Interchange doesn't support it -- the software that
you see that does is htdig.
On the other hand, I have done some things similar with interchange. What
you do is:
1. Inside the [no-match] [/no-match] area, you pull apart the
mv_searchspec value.
2. Examine your stemming list and look for pluralization or
other stems. The Lingua::Stem module Perl module provides some
methods, i.e.:
@words = 'filters';
use Lingua::Stem qw(stem);
my $stems = stem(@words);
print join "\n", @$stems;
This could easily be put into an Interchange global UserTag:
UserTag stem Order words
UserTag stem addAttr
UserTag stem Routine <<EOR
sub {
my ($words) = @_;
use Lingua::Stem qw/stem/;
my @words = grep /\S/, split /\s+/, $words;
my @stems;
for(@words) {
my $stem = stem($_);
next if $stem eq $_;
push @stems, $stem;
}
return join " ", @stems;
}
EOR
At that point, you can call it in embedded Perl with:
[no-match]
[perl]
return unless ref $Values->{mv_searchspec};
my @alt;
for(@{$Values->{mv_searchspec}}) {
my $stem = $Tag->stem($_);
push @alt, $stem if $stem;
}
## Rest of code which builds search links left as exercise
## for the user....
[/perl]
[/no-match]
3a. Provide some clickable links that allow people to search
for the alternate stems....
--or--
3b. Do the alternate stem searches automatically via a [bounce ....].
--
Red Hat, Inc., 131 Willow Lane, Floor 2, Oxford, OH 45056
phone +1.513.523.7621 fax 7501 <mheins@redhat.com>
People who want to share their religious views with you
almost never want you to share yours with them. -- Dave Barry