[ic] search engine indexing scan/ MM=0f73bb47ac44f4e422.....

Fri Jun 16 18:48:09 EDT 2006

On Fri, 16 Jun 2006, Jon wrote:

>> I just noticed that Google is reindexing our site after the upgrade to IC 5.4
>>
>> Among the normal results are some of these:
>>
>> www.mrlock.com/eshop/locks/scan/
>> MM=0f73bb47ac44f4e422fab7057f73d0c0:250:299:50.html?mv_more_ip=1&mv_n...
>> - 58k -
>> <http://64.233.187.104/search?q=cache:1wYAQPUy_g4J:www.mrlock.com/eshop/locks/scan/MM%3D0f73bb47ac44f4e422fab7057f73d0c0:250:299:50.html%3Fmv_more_ip%3D1%26mv_nextpage%3Dresults%26mv_arg%3D+cat+60+lock&hl=en&gl=us&ct=clnk&cd=3-
>> <http://www.google.com//search?hl=en&lr=&q=related:www.mrlock.com/eshop/locks/scan/MM%3D0f73bb47ac44f4e422fab7057f73d0c0:250:299:50.html%3Fmv_more_ip%3D1%26mv_nextpage%3Dresults%26mv_arg%3D>Similar
>> pages
>>
>> If I click on the link on the google site - it returns nothing, but
>> if I click on there cached page it does show the result the spider
>> obtained originally.
>
> What that appears to be are the Timed built pages I think. I see the 
> same on my site when there is the Page forward/back via the [more-list] 
> tag... some magic under there some where.  When google crawls and picks 
> up those pages they exist but when you click the links in the future the 
> page is gone because, I assume, it has expired and needs to be created 
> again on the fly.  How to circumvent this in particular for google I do 
> not know but wish I did since I've got the same problem. I think this 
> has been discussed and explained some time ago but I've not been able to 
> find it in the archives.

I haven't done this before, but it should work:

Set up your RobotUA etc. to detect GoogleBot (as is on by default). That 
sets CGI mv_tmp_session when a robot is the user.

On the page where you're using [more-list], set the matchlimit to a very 
big number, so that all the results fit on one page, e.g. ml=10000. Then 
when a search engine indexes the page, it will get all the content at 
once, and no more-list pages that won't work later.

Jon

-- 
Jon Jensen
End Point Corporation
http://www.endpoint.com/