Site   Web

November 21, 2013

How to Stop Feeding E-mail IDs to Scraper Bots

Scraper Bot
Image Credit: Constantine Belias via flickr

Scraper bots often target e-mail addresses so they can create lists for spammers. However, there are a few techniques you can use to block them.

One of the most common things website scrapers target is e-mail addresses. By gathering thousands or even millions of e-mail addresses, these scrapers are able to turn a very tidy profit. Some use the e-mails themselves by sending out spam messages. If they send out enough of these, even getting responses from only one percent makes it worth the time spent. Others look to make money by selling their scraped e-mails to spammers and companies who think it will help them make money. Fortunately, there are ways you can prevent scrapers from stealing any e-mail addresses you put on your website.

How E-mail Scrapers Work

E-mail scrapers, also called harvesters, work in a very similar method to scrapers designed to copy all of the content of a page. The difference is that they’re programmed to look for two things. The first is linked text. Many people make the e-mail addresses on their websites clickable so that all a person has to do is click the link to open a new e-mail to that address. Once a scraper has found a link, it analyzes it to see if it’s an e-mail address. The easiest way of doing this is to look for the “@” symbol. If it’s there, the scraper copies the e-mail address. If it’s not, it moves on to the next link. Some scrapers don’t even look at the hyperlinked address. They simply copy any text that features the “@” symbol and let a live person decide if it’s an e-mail address.

Cloaking E-mails

There are a couple of different ways you can cloak the e-mails on your page to stop bad bots that aid scraping. One of the basic techniques is to avoid using the “@” symbol in your text. Instead, replace it with [at]. Visitors to your site will understand that they need to replace this with the correct symbol when entering the address into a blank e-mail.

Another technique combats scrapers that look in the website code for e-mail addresses. Instead of using the typical e-mail coding, you replace it with a simple Java script. This script creates a valid e-mail address from different variables you program into it. It stores the username and the domain name separately and then uses a code to combine them along with the correct symbols to make an e-mail address. Bots that try to scrape this code end up with useless information. Visitors to your site won’t see any of this coding— everything will look like they just clicked on an e-mail link as usual.

Utilize Anti-Scraping Services

A number of anti-scraping services exist to help website owners prevent bots from stealing their e-mail addresses and other content. You can easily stop site scraping with ScrapeSentry and other related services. Just like viruses and hacking methods, scrapers are always coming up with ways to get around website defenses.

Feed Scrapers Bad Information

If you want to do more than just block scrapers, there are ways you can fight back. One method is to create what’s called a poisoned page. This page tricks bots into thinking it’s full of great information by featuring a lot of different linked e-mail addresses. However, all of the content on the page is actually generated by a script that creates random e-mail addresses. Scraper bots don’t realize these addresses are gibberish and so fill up their e-mail lists with them. When the spammer sends out information, they get a huge number of e-mails back with invalid addresses. They then have to manually go through the lists and delete all of these invalid addresses, which will take up a lot of their time.

Be Careful Giving Out Your E-mail

There are a number of things you can do to protect your e-mail address from being scraped by other websites. When asked to create a username on a website, use something different from your e-mail address. If another company wants to share your e-mail address, ask that they either use some method to cloak it or link to your website instead. On social media sites, you may either not enter an e-mail address on your profile or click the option to keep your contact information private.

Use Multiple Techniques

The best way to defeat e-mail scrapers is to use different techniques. Not all scraper bots work the same, so the methods used to stop one may not stop another. Use e-mail cloaking, bait pages, and anti-scraper software to create a website that doesn’t provide any useful information to these scrapers. You’ll soon find you don’t get nearly as much spam e-mail, and you might even frustrate the spammers a bit, too.


avatar

Peter Davidson works as a senior business associate helping brands and start ups to make efficient business decisions and plan proper business strategies. He is a big gadget freak who loves to share his views on latest technologies and applications.

css.php