Site   Web

April 4, 2007

Are You Confusing Search Engine Bots?

There are two primary factors to getting a page ranked – discovery and relevancy. By and large, search engines are clever creatures, but the very best webmasters will always send out the right signals to gently guide the search engines, and in return receive great rankings for their content.

Search engines discover content using their bots (or ‘crawlers’), and determine relevancy (and by extension ranking) using advanced algorithms.

The golden rule of SEO is that search engines cant rank a page they don’t know about. This is what makes discovery is so important. The most natural way for a search engine to discover a new resource is by crawling a link pointing at that content. So to get any new resource crawled quickly you should get a few links from other sites that are crawled regularly. (The major search engines have a sitemap initiative, but remember that without a solitary link Google will not index your content regardless of sitemap).

Getting your page crawled is less than half the battle. Now comes the hard part – ranking well. The second factor that determines whether your page ranks well is relevancy. Relevancy is determined by search engine algorithms which decide the order to display results to searchers. A number of on-site and off-site factors are incorporated into the relevancy determination which I’ll look at in a moment. (Trust could be also be dropped into the mix here, but I’m assuming that away for the moment).

How can you guide the search engines?
Webmasters actually have the greatest say in signalling for both discovery and relevancy. I use the term signalling because that’s really what SEO is all about – sending the right signal to the search engines.
To explain more about signals I’m going to have a look at another of the Irish Blog Award nominee sites which availed of the free site review offer.

First Partners
I met Paul Browne at the Irish Blog Awards a few weeks back. Paul writes regularly on his technology-themed First Partners blog:

Back to relevancy
The page title is probably one of the most important on-page elements used by search engines to determine the relevancy of your web pages. By and large you should target 1-3 keyword phrases, and bear in mind that most searches are around 3 words in length.
In the case of Paul’s blog homepage I notice that he is using dynamic titles which include the title of the most recent post. This in my view is a mistake – the homepage page title is about as sacred as it gets, and you don’t want it changing every day or so. I think Paul should concentrate on the main focus of his blog, whatever niche that might be, and use that in his blog homepage title.

The canonical URL problem (again)
I’m probably beginning to sound like a broken record. The canonical URL problem is a condition where your site or page is accessible by typing either of the following into your browser: or : (notice this second case drops the www)
If you can reach your page via either URL AND the URL in the address bar does not change your site is suffering from the canonical URL problem.
In Paul’s case his site is accessible via both the www and non-www URLs. To fix this problem you need to redirect one URL to the other with a 301 redirect.

Don’t use 302 redirects for your homepage
When checking Paul’s blog I noticed that the homepage had a Toolbar PR0. This is odd given that the blog has PageRank 5. Then I noticed that the root page is redirecting to

GET / HTTP/1.1


User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv: Gecko/20070309 Firefox/

Accept: text/xml,application/xml,application/xhtml+xml,[… ]png,*/*;q=0.5

Accept-Language: en,en-us;q=0.5

Accept-Encoding: gzip,deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

Keep-Alive: 300

Connection: keep-alive

Cookie: __utma=67859462.28111[… ]__utmc=67859462

HTTP/1.x 302 Found

Date: Mon, 02 Apr 2007 16:35:35 GMT

Server: Apache/2.0.52 (Red Hat)


Content-Length: 304

Keep-Alive: timeout=15, max=100

Connection: Keep-Alive

Content-Type: text/html; charset=iso-8859-1

X-Pad: avoid browser bug

If the homepage is going to stay there then I suggest changing that to a 301 redirect. The most probable reason why the temporary homepage is currently PageRank 0 is that it has few if any backlinks. The backlinks Paul has accumulated point at rather than, and Google doesn’t realise that /rp/ is now the homepage. No 301 = No transferral of links and trust

And some advice for the blog?
I had a few ideas when I looked at Paul’s blog. I found that the page weight was a little too beefy, with the blog homepage weighing in at 800KB+ on one occasion last week. I also thought that Paul could cut the number of posts published per page to a more manageable number. And I even considered whether NOFOLLOWing some of the internal links (e.g. the cloud) might help.
But I can safely scrap all that advice for one simple suggestion: give each and every blog post a unique META description.
When I looked at all the pages in the supplemental index it was instantly apparent that Paul wasn’t using META descriptions:

You can see that Google is picking up boilerplate content for every snippet. I’d be willing to bet that at least some of the 265 pages in supplemental will pop out if they have a unique description META.
I did spend a short amount of time looking at the backlink profile for the blog and the majority of links use the anchor “Paul Browne – Technology in plain English”. I reckon Paul probably ranks well for his name (he had a thread about his on-line doppleganger but I couldn’t find it). I think some diversification of the link anchor could pay off – non-diverse backlink anchors may actually raise a flag that could damage your site.
So find that niche and push it in your titles and anchors. In Paul’s case that niche should be highly relevant to his company’s products and services. I’ll leave the idea generation to Paul.

So to recap my advice
1. Fix the blog homepage title

2. Sort the canonical URL

3. Change the root page 302 redirect

4. Assign unique META descriptions to each blog post

Author:  Richard Hearne is the founder of Red Cardinal, a dedicated search marketing consultancy. A frequent contributor to Google’s Webmaster Group, Richard regularly advises clients on Internet marketing strategy and Search Engine optimisation campaigns. Richard’s thoughts and research can be found on his search marketing blog.