SiteProNews: January 16, 2008 Feature Article

To Print: Click here or Select File/ Print from your Browser Menu.


  Article printed from SiteProNews: http://www.sitepronews.com
  HTML version available at: http://www.sitepronews.com/archives.html
All Search Engines Love Spiders: How Meta Commands Can Help You Love Them Too
By Scott Buresh (c) Medium Blue 2007

Nearly all search engines utilize spiders (which are also known
by their original name, robots) to go out and scour the web
looking for web pages. These search engine spiders then bring
the data back to be indexed by the engine.

Since roughly 1996, individual meta commands have existed that
can be used on individual web pages to modify how these search
engine spiders behave. The most useful of these commands are
fairly universal and respected by almost all search engines.
What follows is a list of some of the more popular spider
commands and instances in which you might want to use them.

<meta name="robots" content="index">

This meta command is one of the most common ones used – and it
is also the least necessary. It tells search engine spiders to
come on in and put the page in their index. However, all search
engines do this by default anyway. Basically, if you want to put
it in there for fun, be my guest, but this command is not giving
you any special treatment. All search engines are going to index
your page, unless you specifically tell them otherwise.

<meta name="robots" content="follow">

The follow command is different from the index command. It
basically requests that the search engine spiders follow the
links that are on a particular page. Again, however, this piece
of code is completely unnecessary because all search engines are
going to follow the links on a page, unless otherwise directed.

<meta name="robots" content="noindex">

The noindex command, the opposite of the index command, tells
search engine spiders not to index the content of a page. It's
important to note however that search engine spiders will still
follow the links on a page that uses only this command.

When not used for legitimate purposes, this tag can be dangerous
because it can put you at risk for penalization by most, if not
all search engines. This is because you can use a noindex tag to
hide pages with multiple links that you don't want visitors to
see but that you do want all search engines to index.

There are however some legitimate uses for the noindex command.
For example, if you have a dynamic site and you've created
static pages to replace some of your dynamic pages, which can
make them easier for search engine spiders to access, you could
put a noindex tag on the dynamic version.

As Google mentions in its Webmaster Help Center:

"Consider creating static copies of dynamic pages
(http://www.google.com/support/webmasters/bin/answer.py?
answer=34431). Although the Google index includes dynamic pages,
they comprise a small portion of our index. If you suspect that
your dynamically generated pages (such as URLs containing
question marks) are causing problems for our crawler, you might
create static copies of these pages."

In cases like these, it is acceptable to use the "no index"
command on the dynamic version of the page, so that your content
will not be treated as duplicate. You are not tricking all
search engines, you're just redirecting them.

<meta name="robots" content="nofollow">

This tag tells search engine spiders that it's OK to go ahead
and index a page and list it but that they shouldn't follow any
of the links that are on the page. This can be useful if, for
example, you had some partners that requested a link on your
site that you felt obligated to give, but you wanted to hold
onto as much Page Rank as possible. Now this is of course
between you and your own personal god, but you would be able to
in effect have a partners page, add the nofollow attribute to
the meta tags, and basically not pass on any of your Page Rank
to any of the sites to which you are linking. The nofollow
command in effect tells all search engines that this is the end
of the line.

<meta name="robots" content="noindex,nofollow">

Obviously, noindex and nofollow are powerful tags – and in
combination, they can make a page and the subsequent pages to
which it links invisible to nearly all search engines.  This
combination command tells search engine spiders, "Do not read
this page; do not follow any of the links on this page; do not
include this page in your index."

This command has its beneficial uses. For example, it can be
placed on pages on a site that have duplicate content for
legitimate reasons. A website might have both a page for the
United States and a page for England that cover the same product
with exactly the same content. However, nearly all search
engines would see this as duplicate content and could devalue
both pages. So placing this command on one of them means that
search engine spiders will walk on by and you won't be
penalized.

<meta name="robots" content="noarchive">

Finally, almost all search engines today, including Google and
Yahoo, offer a cached version of a site alongside its listing
that provides a snapshot of what the page used to look like. The
noarchive tag, therefore, is available to be used in
circumstances where there is content on your website that is of
a timely nature and therefore that you might not necessarily
want search engine spiders to cache for people to have access to
moving forward.

For example, a business might run a one-time special that has a
ridiculously low price to drum up some business while things are
slow. The business will want to be able to shut that sale down
as soon as sales are back up to a solid level. However, it is
conceivable that someone could click on the cached version of
the business's site, see the old deal that was out there, and
insist on getting it for themselves. By using the noarchive tag,
you are telling search engine spiders, in effect, "This page is
subject to frequent changes, and I don't want my visitors to
have access to some of this content at a later time."

Conclusion

The commands discussed above are just a few of the ones in
existence, and new ones are being added frequently. While nearly
all search engines support these commands, there are still some
that don't.  The ones in this article, however, are fairly
universally understood by search engine spiders, no matter from
where they originate. As more universal commands are introduced,
I will write about them in future articles.
================================================================
Scott Buresh is the CEO of Medium Blue
(http://www.mediumblue.com/), which was recently named the
number one search engine optimization company
(http://www.mediumblue.com/) in the world by PromotionWorld.
Scott has contributed content to many publications including
Building Your Business with Google For Dummies (Wiley, 2004),
MarketingProfs, ZDNet, WebProNews, DarwinMag, SiteProNews,
ISEDB.com, and Search Engine Guide. Medium Blue serves local
and national clients, including Boston Scientific, DS Waters,
and Wake Forest University Baptist Medical Center. Visit
MediumBlue.com to request a custom SEO guarantee
(http://www.mediumblue.com/seo-guarantee.html) based on your
goals and your data.
================================================================

Copyright © 2008 Jayde Online, Inc.  All Rights Reserved.

SiteProNews is a registered service mark of Jayde Online, Inc.