[ic] weighted search result sorting

Paul Jordan interchange-users@icdevgroup.org
Fri Jan 3 12:09:01 2003


> Paul Jordan [paul@gishnetwork.com] wrote:
> > sku		keywords
> > sku123	ocean, island, sky, trees, water
> > sku124	sky, clouds, blue, day
> >
> > Lets say I have thousands similar to this. The problem arises when
> > someone searches for the term 'sky'. It will pull both results from
> > above, but if I sort by sku it will show the pictures of
> the island with
> > water and sky, or any number of picture with sky in it WILL appear
> > BEFORE a simple brilliant SKY by itself.... which is not
> good, if left
> > to sorting a field value.
> >
> > I have been thinking of ways to "weight" the result set. I am not an
> > expert on efficiency nor databases. I am using Mysql, but NOT an SQL
> > query because I am doing full text searches.
> >
> > A pseudo idea would be like:
> >
> > sku		keywords
> > sku123	ocean_7, island_9, sky_5, trees_4, water_5
> > sku124	sky_10, clouds_10, blue_3, day_2
> >
> > I have no idea if this is possible but in the above it is
> assumed that
> > with substring matching turned on, 'sky' will still be a
> HIT for both,
> > then maybe create some custom tf=? or method of sorting based on the

   [snip]
> I'd have a table along the lines of the following:

     hmmm, how did I know?  ;)

>
>     sku     keyword         weight
>     ------- --------------- --------
>     sku123  ocean           7
>     sku123  island          9
>     sku123  sky             5
>     sku123  trees           4
>     sku123  water           5
>     sku124  sky             10
>     sku124  clouds          10
>     sku124  blue            3
>     sku124  day             2
>
> PRIMARY INDEX: sku, keyword
> NON-UNIQUE INDEX: keyword
>
> The following (untested) SQL queries should then work as follows:
>
> Query:
>
>     SELECT  sku, SUM(weight) AS skuweight
>     FROM    keywords
>     WHERE   keyword IN ('sky')
>     GROUP BY sku
>     ORDER BY skuweight DESC
>
> Results:
>
>     sku     skuweight
>     ------- ---------
>     sku124  10
>     sku123  5
>
> Query:
>
>     SELECT  sku, SUM(weight) AS skuweight
>     FROM    keywords
>     WHERE   keyword IN ('sky','ocean')
>     GROUP BY sku
>     ORDER BY skuweight DESC
>
[snip]


Thanks Pat, Kevin and John.

I was hoping that the thread would stay close to my pseudo idea. Having
the keywords in a separate table poses many issues. For one, It would
slow down item entry into the database considerably. However I think
Mike is working on new improved LINK model for table linking in ADMIN,
which may help some with *that* issue.

Then, we are working on a sort of "build you own search engine" type
thing. It will allow to narrow the search down by searching any number
of columns (prod_group, cap_group, aspect, type, etc etc).

This would now mean that then Kevin's above solution would have to be a
select join on tables 'keywords' and 'products'... Do you think that
poses any kind of performance issues? Believe me when I say there will
be millions of sku#'s, This makes for a very large keywords table, as
each sku can have 5,6,7,8... keywords.

This can get ugly fast, as I also give the option of "exclude" keywords.

-> At this point is I ask, IS IT POSSIBLE to keep this in the 'products'
table with an ITL search? If not then I will have to look into reworking
our tables and using sql like Kevin suggests, although for many reasons
doing that is about as attractive as my sophomore year girlfriend :-\

-> At that point I ask, will doing Kevin's suggestion, but joining it to
the products table be no good.... or "the wrong direction" ?

Thanks agian for all youses insight!

Paul