April 2, 2009
Greg says he is a geek who has been a software engineer at Google for 4 years. He works closely with Matt Cutts and his crack anti-spam taskforce.
Greg starts his presentation by breaking down the anatomy of an URL:
Google.com.au: is the Domain
maps.google.com.au: is the Host
/maps/mm: is the Path
q=sydney: is the Query
#SMX: is the Fragment
Google doesn’t index fragments.
Google’s Guidelines for Site Structure:
1) Use a tree-like organization
2) Use similar topics in the same part of the tree
3 ) Sub-domains vs sub-directories? Don’t fret but often directories are easier
4) Multiple domains :
– won’t get tabbed over UI
– more results from google.com
– harder to build reputation
– still can be the best option
5) Link everthing together in an organized way
Google’s Guidelines for URLs
– sharable beteween users (each item reference-able among friends)
– Linked within several hops (no orphan pages)
– Largely unique content per URL (avoid serving different languages on same URLs)
Some URLs are all different but others are seen as the same by Google. So how to fix duplicate content issues?
Canonical means reduced to the simplest and most significant for, possible without loss of generality. Therefore:
– pick one canonical URL for each page and ensure you link consistently within your site
– make all the non-canonical urls throw a permanent 301
– on Google’s Webmaster Tools, specify www vs. non-www in the console
Google’s Guidelines for Proper use of Response Codes
Use 301s for permanent redirects. That signals to a search engine to transfer properties.
Aypical duplicate content issue is the Printer Friendly version of your page. Another dupe content issue is the navigation paths when shown in the URL of dynamic sites that offer several ways to reach the same page i.e. tents/bags/red/tent_bag.html vs bags/tents/red/tent_bag.html
New Option for Duplicate Content
Use the Canonical Link Element added at page level e.g.
<link rel=”canonical” href=”http://example.com.au/page.html”>
For more information, search Google for “specify your canonical” to find canonical tag to use.
Use this only on the same domain. It works across sub-domains and hosts. You can use it instead of a 301 to resolve canonical issues. But it should only be used for pages that are identical or very similar.
Absolute vs Relative URLs
Google suggests that you use absolute URLs when structuring your site. Better for Googlebot, plus, they leave less room for error. Google CAN follow a chain of canonicals but don’t count on it. Point directly to the final URL.
Make your Site Accessible Without Forms
– most often, search engine crawlers don’t complete pull-down menus within forms
– don’t count on Flash content being indexed the way you want
Rich Media with HTML Content and Navigation
– use image tags
– use text descriptions
– Consider slFR for Flash
– Google can index Flash directly but it’s not perfect, therefore consider using slFR
For more information, do a Google search for “best uses in Flash” for more info on slFR
Eliminate Soft 404s
– 404s confuse users
– 404s can cause duplicate content for search engines and crawlers may not discovere new pages
– tell Google what pages are real and which aren’t so they don’t strike soft 404s. Use Google Webmaster Tools to find and eliminate 404s.
Submit a General Web Sitemap
Sitemaps can influence Google’s understanding of your site.
Declare your sitemap URL in robots.txt You can also upload several specific XML sitemaps for your rich media e.g. news, video etc
Indexing Stats Include Sitemap Submissions
Your sitemap statistics in Google Webmaster Tools can give you all sorts of information about your site’s health and the status of your sitemaps and how they are being indexed by Google.