[ic] how do I stop Google from trying to index scan pages?

Wed Sep 17 23:02:07 UTC 2008

Hi list,

IC 5.4.2	Perl 5.8.8	Old Construct cat on Centos 4.7

My client has over 15,000 products, but Google only ranks about 400 in their
index. The last 4 pages in the Google index are scans. It SEEMS like after
hitting 4 scan pages, Google stops and turns away (probably because the page
content appears to be similar). I make changes to pages and robots.txt then
wait to see the new Google ranking in a few days/week. I have a lot of
respect for Google and always spell it with a capital 'G', but I still have
this problem. ;-)

I've been in the archives, but I can't get the precise info I need to change
my robots.txt to stop these pages from being indexed. This is an example of
the pages in question:

http://www.my-domain.com/cgi-bin/storeabc/scan/fi=products/st=db/sf=category
/se=DVD%20Video/ml=16/tf=description.html

I have RobotUA, RobotHost, and RobotIP settings in catalog.cfg. I have a
robots.txt file in my httpdocs directory, with entries like this (among
others):

User-agent: Googlebot
Disallow: /*?

User-agent: *
Disallow: /storeabc/scan
Disallow: /scan
Disallow: /storeabc/process
Disallow: /process
Disallow: /cgi-bin/storeabc/process
Disallow: /cgi-bin/storeabc/scan/
Disallow: /cgi-bin/storeabc/search
Disallow: /cgi-bin/storeabc/pages/process
Disallow: /cgi-bin/storeabc/pages/scan/
Disallow: /cgi-bin/storeabc/pages/search

I really just want the flypages, like this, to be ranked:

http://www.my-domain.com/cgi-bin/storeabc/sku12345.html

Any tips, pointers, ideas, ridicule?

Curt Hauge