SiteProNews: August 10, 2005 Feature Article |
|
To Print: Click here or Select File/ Print from your Browser Menu. |
Article printed from SiteProNews: http://www.sitepronews.com HTML version available at: http://www.sitepronews.com/archives.html
Google SiteMaps and You
By Trevor Bauknight (c) 2005
Last week, we looked (http://www.cafeid.com/art-rss.shtml) at
the recent news that Microsoft had decided to embrace RSS in a
big way in its upcoming releases of Internet Explorer and
Windows "Longhorn" and determined that this was a Good Thing.
This week, we're taking a look at implementing Google Sitemaps,
a similar technology developed by Google in order to help you
define your site more effectively to the search-engine
behemoth. This is not a ticket to a higher Google ranking (at
least not that we know about); but it is a useful tool that
lets you apply RSS-like control to your website's interactions
with the Googlebot.
RSS (Really Simple Syndication) is the current heavyweight of
so-called "disruptive technologies" (loosely defined as those
that have the effect, if not developed with the intention, of
changing the way we use technology in general) and its use is
skyrocketing among content providers looking for a way to get
their content in front of more eyes and ears. But RSS
originally stood for Rich Site Summary, a standard way of
cataloging your site's content for third-party aggregators.
Google Sitemaps have a similar function, in that they are an
XML-based way to describe website content in a standard,
predictable way; but they differ in that Sitemaps are intended
for the Googlebot's eyes only, rather than for any third-party.
Think of them as an automated way to make sure Google knows
about your site's content (please note, however, that Google
does not guarantee inclusion of your content based solely on
the presence of a Sitemap file).
This sounds like a very specific undertaking, but the importance
of Google to getting your site's content noticed can simply not
be overstated. And with Google's expanding reach into more and
more areas of Web content presentation, chances are that you
can be assured that the information your Sitemap provides will
eventually find some use you haven't yet thought about. That's
what disruptive technology is all about, and Google has become
one of the more innovative champions of such technological
advances.
Where To Start
The first thing you should do as a website developer is create
a Google Account for yourself or your company. This will allow
you to do other things besides access the Sitemaps
infrastructure; but we'll leave that for another day. Create
the account here (https://www.google.com/accounts/NewAccount)
and then proceed to the Sitemaps area at this link
(https://www.google.com/webmasters/sitemaps/login). Once you've
logged in, you'll see the sparse Sitemaps interface. Don't be
fooled, however, because like the simple interface to its
search engine, this one hides quite a bit of information
regarding the creation and use of Sitemaps, presenting it in
digestible bites as you walk through the process.
There's probably more there than you need to know at this
point, provided you don't have a huge site with a need for
multiple Sitemaps and so on. But if you do have such a site,
the information is there for creating truly complex Sitemaps
and Sitemap Indices referencing many Sitemaps and you can
familiarize yourself with that as needed. For now, we'll
concentrate on what's required to establish a Sitemap for our
site at Cafe ID (http://www.cafeid.com).
Like creating RSS feeds, creating a Google Sitemap is as simple
as putting together an XML file at the root level of your site
that describes the site according to the instructions that
Google has laid out. You can use any text editor for this
purpose, but some editors do a better job of helping you create
properly formatted XML files. We heartily recommend two that
cost money, BBEdit on Mac OS X (http://www.barebones.com) and
Macromedia's Homesite on Windows
(http://www.macromedia.com/software/homesite/), but there are
excellent free alternatives out there and when it comes to text
editors, personal preferences take on an almost religious
importance (http://www.gnu.org/software/emacs), so we won't
proselytize about that here.
The Googlebot recognizes several Sitemap formats, ranging from
a simple list of URLs to Sitemaps already created using
something called the "Open Archive Initiative protocol for
metadata harvesting"
(http://www.openarchives.org/OAI/openarchivesprotocol.html),
a format apparently popular with library collections. The OAI
protocol is an advanced XML specification that you don't need
to worry about if you don't already understand. An intermediate
XML format is what we recommend, over the simple URL list,
because of the additional information you can associate with
each constituent URL of your site.
If you do want to just get started quickly, simply create a text
file that looks like this:
http://www.example.com/catalog?item=1
http://www.example.com/catalog?item=11 ...
making sure that the file in question does not include embedded
newline characters and uses the UTF-8 text encoding (check your
text editor settings). Also, your sitemap may not contain more
than 50,000 URLs and all URLs must me fully-formed since they
will be used directly during the Googlebot's crawl.
Getting Fancy
The more advanced format isn't much more difficult to create and
lets you specify additional information about each URL. The
protocol is described fully here
(https://www.google.com/webmasters/sitemaps/docs/en/protocol.html)
and is too detailed to explain here. Your finished file will
look something like this, except (hopefully) with more URLs
specified:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://www.cafeid.com/
<lastmod>2005-01-01
<changefreq>monthly
<priority>0.8
</url>
<url>
<loc>http://www.cafeid.com/art-over.shtml
<changefreq>weekly
</url>
</urlset>
Your Sitemap's location dictates what URLs can be included in
it. A Sitemap placed at the root level of your site can specify
any URLs on that site, while a Sitemap placed at
www.yoursite.com/images can not include URLs under
www.yoursite.com/banners, for example.
You can take as full or as little advantage of the availability
of the various additional XML tags available in this format.
Each <url> needs to include at least the <loc> specification,
but need not include the other three, and all URLs in a Sitemap
file must be encapsulated within the <urlset> tag. We
recommend using at least the <lastmod> tag and the <changefreq>
flag to let the Googlebot know how often it should check your
site for updated content. Be sure to change the date, and
maybe even the time, specified in the <lastmod> tag any time
you actually update your site.
One more caveat is that your URL specifications must be
XML-encoded, similarly to the way they're encoded under RSS.
What this means is spelled out in detail here
(http://www.w3.org/TR/REC-html40/appendix/notes.html), but
essentially, what you're doing is converting a URL like
http://www.yoursite.com/view?widget=3&count>2
to look like this:
http://www.test.org/view?widget=3&count>2
(Note the substitution for the HTML entities & and > for
the "&" and ">" symbols.)
Done. Now What Do I Do With It?
You're almost home. Upload the Sitemap file you create to your
server and then add the URL to the file itself using your
Google Sitemaps account. You don't need to use the account,
but doing so will allow you to keep track of what you've
uploaded. You're welcome to compress your Sitemap file using
gzip, found typically on Mac OS X, Linux and BSD (normal PC
zipping won't work, although you can certainly find a
third-party gzip program for your Windows box). Click the "Add
Your First Sitemap" link on the main Sitemaps page after you've
logged into your Google Sitemaps account, and that's all there
is to it!
You can use your Sitemaps account to keep track of and receive
diagnostic information about your Sitemap submissions. You
don't need to create a Sitemaps account, however, and if you
already have a Google account for receiving Alerts, for
accessing the Web Developer APIs and so on, your existing
account will work as a Sitemaps account automatically.
Google has already played a significant role in shifting the
paradigm of discovering the Web from doing so by following
links to doing so by searching, and the company shows no signs
of slowing down. Subscribing may well be the next paradigm,
based on the flexibility of the protocols that put content
syndication in the hands of mere mortals, and getting your
content cataloged in these formats should be among your first
priorities. The web browser and operating system is adjusting
quickly to this new paradigm, and you should be too.
================================================================
Trevor Bauknight is a web designer and writer with over 15 years
of experience on the Internet. He specializes in the creation
and maintenance of business and personal identity online and
can be reached at trevor@tryid.com. Stop by http://www.cafeid.com
for a free tryout of the revolutionary SiteBuildingSystem and
check out our Flash-based website and IMAP e-mail hosting
solutions, complete with live support.
================================================================
Copyright © 2005 Jayde Online, Inc. All Rights Reserved.
SiteProNews is a registered service mark of Jayde Online, Inc.