Newsletter and webmaster resources site   Create and Send Postcards in Minutes!
  Advertise in SiteProNews SiteProNews Archives About SiteProNews SPN Privacy Statement FeedBack SiteProNews Homepage SiteProNews Image Map SEO-News Discussion Forums
  Stretch Your Budget - Advertise in SiteProNews
    QUICK LINKS
 
AUG. 22,  ISSUE #677
WEB SEARCH

ExactSeek Links
      Add your Site
      Buy a Top 10 Listing
      Find Your Site Rank
      Schedule a Site Recrawl
      Enhance Your Site Listing
      ExactSeek Member Login

Buy Results, Not Promises - Your Intial Deposit Matched to $100
SEO Tools & Services
A key factor in building website traffic is to use the best tools available. ExactSeek's SEO solution gives you immediate access to 7 effective optimization tools. Get higher ranking on the top global search engines starting today.

Try it Fr-e-e for 90 Days
SEO Tools and Services


Search Engine Forums
Join the SEO-News discussion forums to post comments, tips and articles or learn from SEO experts.
Forum Posts
Yahoo 80
Google 918
SE Articles 47
Link Exchanges 191
General Discussion 66
Join the SEO-News Forums
BLOG Search/Submit
   Add a Blog
   Grab a Blog RSS Feed
   Search Blogs in 30 Categories
   Use Blog Express for RSS Feeds

ExactSeek Toolbar
Get the toolbar with one-click spyware scanning and webpage keyword analysis. Other features include web search on multiple meta engines, popup-blocking, Alexa site ranking, word highlighting, auto-upgrade, erase browser cookies and more.

Download Version 2.3

Webmaster Resources
      Site Ranking Tool
      Meta Tag Generator
      Link Popularity Checker
     Search Engine Submitter
     Internet Tools Directory
     Site Resource Directory
 
Traffic-Generators
Get Free-Traffic for Your Site with these great Traffic-Exchanges:


TrafficZap


TrafficSwarm


SPN Site of the Day
MagPortal.com provides dynamic headline feeds with full-text article search, allowing your site users to find magazine articles on topics tailored for your website. Interesting service.

Does your web site qualify as a SPN Site of the Day? Webmaster resource sites can apply via email: sotd@sitepronews.com
 

SPN App of the Day
FeedExplorer 1.0.12 (3.7 MB) is a RSS feed reader with a nice interface that allows you to organize your feeds into categories and select from different display styles for your headlines. Has a built-in browser, integrated search function and a keyword watcher option. Other features include OPML import/export and system tray notifications. Freeware for Windows 98/ ME/ 2000/ XP.

If you have a Webmaster App that you would like listed on the SPN site, send us an email with details to: wapps@sitepronews.com
 

Jayde Newsletters
Subscribe to SiteProNews, the Net's foremost Webmaster ezine, AllBusinessNews, the weekly newsletter for online businesses or SEO-News, our new weekly ezine for do-it-yourself website optimizers. Just enter your email address in the field below and use the Subscribe button.

HTML Newsletter
SiteProNews
AllBusinessNews
SEO-News


Must Read Ebooks
SPN offers one of the best eBook libraries on the Web. Our current selection includes Commercial and 178 plus Fr'ee eBooks.

Authors of EBooks may submit their publications to SPN via email: ebooks@sitepronews.com
 

Link to SiteProNews
Link your site to SiteProNews, the newsletter and resource site for Webmasters.

Or, Add SPN to your site with just 2 lines of Java-script code. Top content for your site without any of the work.

Visit our SPN Promotion Partners page. Some great sites have opted to support the SPN newsletter.

SiteProNews Partners
SubmitPlus - Promote your site to 110 search engines... FR'EE!

Template Monster - World's number one website templates are available for immediate download.

PreWired.com - Providing ISPs & Publishers a Web based revenue stream!

FindMyHost.com... Review detailed Report Cards of web hosts who made the grade.

Web-Source.net... Your Guide to Professional Web Site Design & Development.

TheCgiSite.com - A directory of programming resources.

SiteUptime.com - A fr-e-e website monitoring service, providing performance reports and uptime stats.

FreeTechMail.org - A search engine where you can review and subscribe to thousands of IT newsletters.

Fr-e-e Alexa Toolbar
An indispensable tool for web professionals, providing Traffic-Data, Site Stats, and Contact Info for all the sites you visit!.

 

Submit Plus
Blog Search
FindMyHost
Add Me.com
DesignerWiz
Web Position
Alexa Toolbar
SubmitExpress
Website Builder
$100 Free-Traffic
Fr-e-e SEO Tools
Website Templates
Make Extra Income
FreeWebSubmission



Search Engine Spiders Lost
Without Guidance - Post This Sign!

Robots.txt Signpost Warns Trespassers From Private Property, By Mike Banks Valentine (c) 2005

The robots.txt file is an exclusion standard required by all web crawlers/robots to tell them what files and directories that you want them to stay OUT of on your site. Not all crawlers/bots follow the exclusion standard and will continue crawling your site anyway. I like to call them "Bad Bots" or trespassers. We block them by IP exclusion which is another story entirely.

This is a very simple overview of robots.txt basics for webmasters. For a complete and thorough lesson, visit Robotstxt.org.

Download Web CEO with Frëe Training/ Certification!

To see the proper format for a somewhat standard robots.txt file look directly below. That file should be at the root of the domain because that is where the crawlers expect it to be, not in some secondary directory.

Below is the proper format for a robots.txt file ----->

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /group/

User-agent: msnbot
Crawl-delay: 10

User-agent: Teoma
Crawl-delay: 10

User-agent: Slurp
Crawl-delay: 10

User-agent: aipbot
Disallow: /

User-agent: BecomeBot
Disallow: /

User-agent: psbot
Disallow: /

--------> End of robots.txt file

This tiny text file is saved as a plain text document and ALWAYS with the name "robots.txt" in the root of your domain.

Get Your Share of the Billion-Dollar AdSense Boom!

A quick review of the listed information from the robots.txt file above follows. The "User Agent: MSNbot" is from MSN, Slurp is from Yahoo and Teoma is from AskJeeves. The others listed are "Bad" bots that crawl very fast and to nobody's benefit but their own, so we ask them to stay out entirely. The * asterisk is a wild card that means "All" crawlers/spiders/bots should stay out of that group of files or directories listed.

The bots given the instruction "Disallow: /" means they should stay out entirely and those with "Crawl-delay: 10" are those that crawled our site too quickly and caused it to bog down and overuse the server resources. Google crawls more slowly than the others and doesn't require that instruction, so is not specifically listed in the above robots.txt file. Crawl-delay instruction is only needed on very large sites with hundreds or thousands of pages. The wildcard asterisk * applies to all crawlers, bots and spiders, including Googlebot.

Those we provided that "Crawl-delay: 10" instruction to were requesting as many as 7 pages every second and so we asked them to slow down. The number you see is seconds and you can change it to suit your server capacity, based on their crawling rate. Ten seconds between page requests is far more leisurely and stops them from asking for more pages than your server can dish up.

(You can discover how fast robots and spiders are crawling by looking at your raw server logs - which show pages requested by precise times to within a hundredth of a second - available from your web host or ask your web or IT person. Your server logs can be found in the root directory if you have server access, you can usually download compressed server log files by calendar day right off your server. You'll need a utility that can expand compressed files to open and read those plain text raw server log files.)

Internet Marketing on a Whole New Level - Test Drive Our Hit Generating System!

To see the contents of any robots.txt file just type robots.txt after any domain name. If they have that file up, you will see it displayed as a text file in your web browser. Click on the link below to see that file for Amazon.com

http://www.Amazon.com/robots.txt

You can see the contents of any website robots.txt file that way.

The robots.txt shown above is what we currently use at Publish101 Web Content Distributor, just launched in May of 2005. We did an extensive case study and published a series of articles on crawler behavior and indexing delays known as the Google Sandbox. That Google Sandbox Case Study is highly instructive on many levels for webmasters everywhere about the importance of this often ignored little text file.

Guaranteed Top 10 Exposure on 50 Search Engines
Your Keywords - No Bidding - No Pay-Per-Click!
Starting at $12/Keyword Term or Less, PLUS
Sign Up Today and Receive FR-E-E Bonus Software

One thing we didn't expect to glean from the research involved in indexing delays (known as the Google Sandbox) was the importance of robots.txt files to quick and efficient crawling by the spiders from the major search engines and the number of heavy crawls from bots that will do no earthly good to the site owner, yet crawl most sites extensively and heavily, straining servers to the breaking point with requests for pages coming as fast as 7 pages per second.

We discovered in our launch of the new site that Google and Yahoo will crawl the site whether or not you use a robots.txt file, but MSN seems to REQUIRE it before they will begin crawling at all. All of the search engine robots seem to request the file on a regular basis to verify that it hasn't changed.

Then when you DO change it, they will stop crawling for brief periods and repeatedly ask for that robots.txt file during that time without crawling any additional pages. (Perhaps they had a list of pages to visit that included the directory or files you have instructed them to stay out of and must now adjust their crawling schedule to eliminate those files from their list.)

Most webmasters instruct the bots to stay out of "image" directories and the "cgi-bin" directory as well as any directories containing private or proprietary files intended only for users of an intranet or password protected sections of your site. Clearly, you should direct the bots to stay out of any private areas that you don't want indexed by the search engines.

The importance of robots.txt is rarely discussed by average webmasters and I've even had some of my client business' webmasters ask me what it is and how to implement it when I tell them how important it is to both site security and efficient crawling by the search engines. This should be standard knowledge by webmasters at substantial companies, but this illustrates how little attention is paid to use of robots.txt.

The search engine spiders really do want your guidance and this tiny text file is the best way to provide crawlers and bots a clear signpost to warn off trespassers and protect private property - and to warmly welcome invited guests, such as the big three search engines while asking them nicely to stay out of private areas.


About The Author
Google Sandbox Case Study. Mike Banks Valentine operates Publish101.com Free Web Content Distribution for Article Marketers and Provides content aggregation, press release optimization and custom web content for Search Engine Positioning. http://www.seoptimism.com/SEO_Contact.htm


Printer Friendly Version of this Article


Recommended Articles and News for Webmasters

Replacing AdSense Defaults With Paid Ads
Last Week was More Interesting than it Seemed
Monitor Your Visibility in Google, MSN, and Yahoo
How Important is ALT Text in Search Engine Optimization?

Need Content for Your Website - GoArticles.com has 57,200+ Articles
Add a RSS feed or Java-script feed in seconds.


Webmaster Resource Sites & Services

Search Engine Tools - The only way to increase your Web site's position and ranking on search engines is to use the optimal tools. The ExactSeek SEO Solution is F-R-E-E for 90 days!

Add Me! - a pioneer in search engine submission, and the most popular. They offer free-submission and paid submission.

Earn a Residual Income! - Internet gaming is the fastest growing segment of web-based commerce. Benefit from the popularity of online gaming with little or no marketing investment.

Google Ranking Secrets Revealed! Boost Your Google Ranking, Get More Orders, And Make More Money!

Recommended Webmaster Tools & Services

Select from 1000's of Quality Templates
Need a new site look? Select from thousands of professional designs for a fraction of web design costs. Get a multi-page website up in just a few hours.

Build Your Traff'ic with ABCSearch
Get $100 of FR-E-E qualified Visitors. Sign-up today and we'll match any initial deposit up to $100. Geo-targeting, full reporting and one-click results!

Build a Business Website in Under 5 Minutes.
Over 172,000 people just like you have used Exact Websites to build professional websites, complete with web pages, photo albums, email, links and 27 other features without ever having built a website before.

WebPosition
WebPosition helps you maximize your site's search engine visibility by providing a complete SEO solution including rank reporting, keyword research, page optimization and submission. Download a fr-e-e demo today!

Have an Opinion on Today's Article?
Post Your Comments in the SEO-News Forums
Sign Up for FR-E-E and Participate

 

  SiteProNews - The Net's most widely read Webmaster newsletter


(c) Copyright 2005 All rights reserved. Jayde Online, Inc.
Web design by
ControlV.