[ic] RobotUA

Grant interchange-users@icdevgroup.org
Wed Nov 27 11:59:01 2002


>Grant wrote:
>
>>Thanks a lot for the info Phillip.  I'd like to clarify a couple things
>>though...
>>
>>
>>
>>>Usually with a 301 it takes a couple of runs from most spiders to decide
>>>to go anywhere else into the
>>>site.
>>>
>>>
>>
>>What is the correct way to forward from your domain to your site's index
>>page so the spiders don't get confused?
>>
>>
>Icdevgroup uses 302 and does not get indexed but for the first page, as
>you may have noticed, from google
>most other search engines will tromp all over the site and produce gory
>listings
>My opinion and of many others I have spoke with at
>http://www.webmastersworld.com (where the GoogleGuy hangs out) say 301
>is the only way if your going to redirect and expect rankings. You could
>use a doorway page
>that uses java script and just place links into the sight with some
>keywords in it for the search engines, but feel that is
>very unprofessional and spammy myself.
>
>

So pretty much everyone uses a 301 or 302 to get to the index page of their
site, and therefore has to deal with this issue?

>>
>>
>>>Now depending on how long your system has been running with a 301
>>>if you move now it will cause
>>>you more problems. Realize that 301 is just like you told the mailman
>>>you have a new address and then
>>>you send a new change of address to all of your magazine companies.
>>>Now how long does it take for them to get around to sending them to your
>>>new address?
>>>Then sundenlly you decide to send them and your mailman a new change of
>>>address again even before
>>>they have actually acted on your old change of address. Well you will
>>>have at least 2 monthns before you get
>>>any magazines or a good part of your mail will end up in
>different places.
>>>
>>>So usually using 301 in difference to 302 that says temp move don't keep
>>>record of it. This is a very bad things
>>>when it comes to spiders if you keep bouncing arround.
>>>
>>>
>>
>>Are you saying a 301 or a 302 is better for spiders?
>>
>>
>Here is a link from Google that talks about what they feel you should do
>
>http://www.google.com/remove.html
>
>And the snippet that talks about 301
>
>*Change the URL of your website*
>
>Since Google's crawler associates the content of a page with its URL,
>there is no way to manually change the URL that is displayed for your
>website. The URL will be updated the next time we crawl your site. The
>crawler revisits each site according to an automatic schedule, and we
>cannot manually accelerate the date on which your site will be recrawled.
>
>If the URL of your website has changed since we last crawled it, you may
>use the URL submission form <http://www.google.com/addurl.html> and the
>URL removal methods described below. However, the URL submission form
>does not take effect immediately, so using the URL removal feature may
>leave your website inaccessible from Google until we crawl your site again.
>
>Instead of requesting a change from Google, we recommend that you ask
>the sites currently linked to your old site to update their links (to
>point to your new site). Also, don't forget to change any entries you
>may have in the Yahoo! directory and the Open Directory. Finally, if
>your old URLs redirect to your new site using HTTP 301 (permanent)
>redirects <http://www.ietf.org/rfc/rfc2616.txt>, our crawler will know
>to use the new URL. Changes made in this way will take 6-8 weeks to be
>reflected in Google.
>
>I feel 301 is better.and also pay close attention to the time google
>says it will take for the crawler to understand the
>new address (6-8 weeks)
>
>>>This is spoken completly from experience since I did this myself and
>>>have seen its effects.
>>>
>>>Also all of your DMOZ entries also need to point to your redirected
>>>location to get credit for it.
>>>
>>>
>>>
>>
>>Where are these entries?
>>
>>
>If your site has been submitted to DMOZ at http://www.dmoz.org and since
>Google and other search engines use
>these listing for supporting your rankings they should be set to always
>match your expected site location
>

Ok, thank you.

>>
>>
>>>Point is this if you have just started doing this move, then leave it
>>>alone. It will take at least 2 months for
>>>google and a few others to catch up. If you have done this for awhile
>>>you could completley lose at least
>>>a months worth of crawls until they get around to seeing the new move.
>>>
>>>This happend to me and I got impatient myself and moved around again.
>>>Lost much traffic and after talking to some people at webmasterworld,
>>>they just told me to not mess with it and be patient they will crawl
>>>your site within one to two
>>>months. If your sids are not showing they will jump on it soon.
>>>
>>>--
>>>Philip S. Hempel
>>>debian/rules
>>>
>>>
>>
>>It seems like there must be a better way to go about all this that doesn't
>>use 301s at all so the spiders will head straight inside.  What would that
>>be?
>>
>>- Grant
>>
>>
>>
>Most SE's do not like 302 (temp redirect) and almost all suggest the
>usage of 301 (permanant redirect)
>302 does not push ranking onto the main page since this is want you want.
>
>Since I quit using 302 and went to 301 and a few other things I
>went from page 5 in
>the rankings to number 1,2,3,4,5 for over 15 key word sets and
>went from 100 users
>to over 800 users (not search engines) in a day average.
>
>Goggle spiders over 200 pages on my site now and we have as of
>today on the average
>have over 10 sales a day (from 1 every 3 weeks). (this is good for
>a supposed part time business)
>
>
>Hope this helps if you need more ask.
>
>(and please excuse typos, wrote this in a rush)
>--
>Philip S. Hempel
>debian/rules

Here's what Google's doing on my site:

64.68.82.70 - - [26/Nov/2002:08:30:13 -0800] "GET /robots.txt HTTP/1.0" 200
0 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.70 - - [26/Nov/2002:08:30:15 -0800] "GET / HTTP/1.0" 301 330 "-"
"Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.5 - - [26/Nov/2002:08:39:22 -0800] "GET /cgi-bin/shop/ HTTP/1.0"
200 38303 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.7 - - [26/Nov/2002:08:49:59 -0800] "GET /cgi-bin/shop/policies.html
HTTP/1.0" 200 35830 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.28 - - [26/Nov/2002:08:52:20 -0800] "GET
/cgi-bin/shop/moreinfo.html HTTP/1.0" 200 39917 "-" "Googlebot/2.1
(+http://www.googlebot.com/bot.html)"

That's it.  This shows that they are getting into the main site, past the
301.  They're just looking at a couple of pages though.  I've verified with
the Sam Spade browser that IC is sanitizing the URLs when the Google User
Agent is used.  Also, it's GETing "/cgi-bin/shop/", but I have NO links to
that particular path anywhere in the site.  The redirect redirects to
www.mystore.com/cgi-bin/shop/index.html.  How could it be hitting
"/cgi-bin/shop/"?  Thanks a lot for all your help Phillip.  Hopefully others
will benefit from this discussion too.  Any idea why the Googlebot wouldn't
be hitting up more pages?  There are a ton of links on that front index page
to all of my product categories.

- Grant