July 10, 2007
Search engine spiders are by far one of the most useful things to come around in the last 20 years of the internet. They are useful not only to the web sites (Google and many others) that use them, but also to people who are searching for a particular site and those who run web sites. Spiders allow your site to be seen by the millions of people who use search engines every day. In this newsletter, we will discuss what search engine spiders do, how they work, and how to set up a robots.txt file and upload that to your site to keep spiders from visiting your site.
What are spiders and what purpose do they serve? Spiders are essentially programs that “crawl” sites and report back to their superior (Google or whatever search engine they were created for) what their findings are. Their purpose is to make it easy for sites to get listed in search engines.
You might be wondering, what does it mean to “crawl” a site? Well it means to visit and site and copy the information.
How do spiders work? Spiders work by finding links to web sites, visiting those web sites, going through the content of a web site and then reporting the content of the site back to the database of the site which they are working for. Google spiders, thus, crawl sites and report the information back to Google’s database. From there, the information is added to Google’s search engine, and the site then shows up in Google search results. Much the same process happens with any other search engine spider.
How can I keep spiders from visiting my site? You might be thinking, “Why would I want to keep such a useful thing from visiting my site?” Well, the short answer is, sometimes site owners don’t want the spider to crawl on a particular part of their site. Some site owners don’t want spiders to crawl their site at all. The reasons for not wanting a spider to crawl a site or a particular part of a site vary, although most of the time it is because the site is either completely spam or features a page or two of spam. If you’re one of those site owners, then you’ll want to create and upload something called a robots.txt file. We will briefly go over how to do this.
A robots.txt file. The whole purpose of a robots.txt file is to tell a search engine spider not to crawl the site or part of the site on which the robots.txt file resides.
Creating the file. Creating a robots.txt file that blocks out spiders is easy. First, open up notepad. Then, copy and paste the following:
User-agent: * Disallow: /
Once you’ve done that, save the file as “robots” and as a .txt file.
Uploading the file. Next, you will upload the file to the part of your site which you do not want the spider to visit. So, if you don’t want them to visit yoursite.com/news/, you’ll upload robots.txt to the news folder. If you don’t want the search engine spider to visit your site as well, upload robots.txt to your index folder. That’s all there is to it.
Using the robots.txt file to make sure search engine spiders DO visit your site
Believe it or not, the robots.txt file can be used to both disallow and allow search engine spiders to crawl your site. Here’s how to create and upload such a file.
Creating the file Open up notepad and copy and paste in the following:
User-agent: * Disallow: You’ll notice that the only difference between this and the earlier example is that Disallow: is not followed with /. If it were, that would tell spiders to go away. Once again, save the file as robots.txt.
Uploading the file All you’ll do is upload the robots.txt file to the part of your site that you want the robot to pay a visit to. So if you want the robot to see the whole site, just put the robots.txt file right alongside the index file. And you’re done. Creating and uploading a robots.txt file to help make sure spiders don’t miss your site is fast and easy. So what are you waiting for? Create and upload that file now!
Author: Terry Detty, 42 years old, finds internet marketing his passion. In addition to marketing he enjoys reading, and occasionally goes out for a short walk. His firm, Easy SEO helps with, SEO Software, internet marketing software, and email marketing.