Site   Web

April 2, 2014

The Google Content Scraping Controversy

Google Logo
Photo Credit: Carlos Luna via flickr

Who isn’t intimidated by Google’s ever-growing influence and its ever-changing set of rules and guidelines? Publishers who have managed to cope with Google’s algorithm updates have learned to watch their step. They are fully aware that they have to play by the book and follow Google’s own letter of the law to protect their rankings. But in this context, one question still remains unanswered: does the mighty Google actually practice what it preaches? Or does it feel the need to break the rules every once in a while, to follow its own agenda? Are Google’s words spoken to be broken?

Is Google Caught in Its Own Scraping Trap?

One thing’s certain: Google’s sustained efforts to punish the so-called Internet copycats have been mocked by a vast SEO community. It all started when Matt Cutts, the head of Google’s webspam team, introduced a form that website owners can fill out if and when they notice that their content has been viciously duplicated by a sneaky copycat, also known as a scraper.

According to Wikipedia, “a scraper site is a website that copies all of its content from other websites using web scraping. The purpose of creating such a site can be to collect advertising revenue or to manipulate search engine rankings by linking to other sites to improve their search engine ranking.”

We are all fully aware of the fact that Google, Bing and Yahoo bend over backwards to penalize cheaters and ensure a flawless user experience for all of us, so there’s nothing wrong with Google’s attempt to corner scrappers and turn its back on them.

But here’s where it gets really interesting: according to Mashable, Google breaks its own rule by using its very own scraper to mine valuable content coming from reputable sources. This dubious practice was exposed by Dan Baker, a digital marketer. Dan Baker revealed that Google actually took the definition of a “scraper site” and placed it above Wikipedia’s link. Barker’s message led to 14,000 retweets, showing that Google’s efforts to fight scrapers have created quite a stir in online communities. It seems that this time, the joke’s on Google and SEO experts had the last laugh.

The road to SEO hell is paved with good intentions. Google has the users’ best interests at heart and clearly does everything in its power to provide the very best search results in a timely fashion. But the fact that Google harvested Wikipedia’s content and placed it above its original source made many people raise an eyebrow. Is Google allowed to break its very own set of rules and outrank Wikipedia?

Are Google’s Rules Meant to Be Broken by Search Engines?

When it comes to targeting scrapers, Google is actually focused on penalizing publishers who steal and redistribute content from someone while following their own illegitimate purposes. At this point, its measures are criticized because Google has raised the number of its direct answers and web definitions “borrowed” from respectable, first-hand sources.

According to Search Engine Land, Google’s decision to draw content from other sources is perfectly justifiable, since it makes sense for Yahoo, Google and Bing to display the accurate, short direct answer rather than encouraging different websites to do the same thing while fighting for a bigger slice of the pie. But it’s also clear that Google’s own form of content scraping somehow breaches the traditional contract between publishers and search engines. Search engines are in their own right to harvest the fruits of the publishers’ labor, but only when a fair traffic exchange is part of the deal.

Google linked to the original source, but this doesn’t change the fact that Wikipedia’s definition is placed under Google’s semantic box. Let’s face it: we are all obsessed with the idea of landing on page 1 in Google. Maybe Google makes no exception. But the problem is that Google’s eagerness to bend its own rules may affect where we are and where we want to get, in terms of rankings.

Google’s intention to put an end to scraping, the shady practice that gets many publishers into trouble, is obviously surrounded by controversy. While trying to teach scrapers a lesson, Google is actually relying on its very own kind of unorthodox content farming. This action reflects the giant gap between Google’s mission to provide quick, precise answers and its role as an essential Internet portal.

So how does this phenomenon actually impact us? Should we still follow Google’s rules and recommendations to stay in its good graces or should we dare to take a bite from the forbidden fruit and scrape content instead of working hard to create and improve our own?

Can Web Scraping Make or Break Your Business?

Everyday people are more interested in how content scraping could affect their popularity and most importantly, their profit margins. Truth be told, web scraping won’t do you any favors. Users are now extremely savvy and have what it takes to distinguish first-class content from second-hand material at a first glance. They don’t want to spend their precious time reading written trash. Visitors are always hungry for premium, 100% original, valuable content and this is a golden rule that no search engine can bend.

If you don’t address the ever-growing needs and demands of your targeted audience, in terms of web writing, you are automatically disqualifying yourself and enabling your main competitors to open a bottle of champagne to celebrate your failure. Make no mistake: original content that meets the expectations of your readers will bring you and keep you in the public eye for the longest period of time. There is no substitute for powerful copy, so you should invest time, money and energy in your own personalized content reflecting who you are, what you do and why you do it best, instead of stealing chunks of text from your rivals.

Web scraping hurts your business in different ways:

  1. By decreasing your SEO ranking and reducing your traffic
  2. By impacting your subscriber base in a negative manner
  3. By increasing both your bandwidth and your legal costs
  4. By disappointing your visitors who are able to separate low-quality, plagiarized content from original, high-value web writing.

Protecting Your Content from Devious Web Scrapers

In this context, it goes without saying that web scraping is one of the most counterproductive practices that doesn’t support your long-term goals. But even if you are on the good side of Google’s law and would never think about stealing someone’s written ideas, there is one more aspect that you should take into consideration. If you’re good and you know it, when it comes to writing killer content, you may become an easy target for skilled web scrapers.

To protect your web writing, your ranking and your online reputation and keep scrapers at bay, you just have to follow a few basic steps:

  • Understand the Copyright Law. You should be fully aware of the fact that the copyright law safeguards your intellectual property and protects your ownership of a certain written material. If you feel the need to add a few extra measures of precaution, you can register all your work, while making sure that all your content is accompanied by a copyright notice
  • Rely on Anti-Scrapping Software. You don’t need to be tech savvy or to invest tons of money in software designed to discourage web scrapers. Effective anti-scraping software can be downloaded for free. For instance, ScrapeShield is a cost-free app that detects the presence of stolen content. It works in conjunction with Maze, a network that was built to name and shame web scrapers.
  • Rely on WordPress Plugins. If you use blog posts to support your business promotion goals and you utilize WordPress to spread the word about your company, you have the chance to explore the advantages of several third-party tools designed to identify web scraping techniques and fight them in a successful manner.
  • Use the Google Scraper Report Form. While some may argue that Google has trolled itself by giving the green light to its very own web scraping technique, it’s obvious that the Google Scraper Report is an extremely valuable defensive weapon that we can use as soon as we realize that someone is outranking us by making the most of our own content. Google has a Digital Millennium Copyright Act that allows users whose intellectual property has been stolen by scrapers to contact internet service hosts or providers and ask them to remove the plagiarized content. However, the whole process is time-consuming.

The form launched by Google reveals a new approach. Google is tackling infringement content as a spam offense, instead of cataloging and addressing it like a copyright issue. According to Search Engine Land, the new form doesn’t guarantee a quick fix. The same source indicates that the Google Scraper Report may not lead to the desirable removals at all, since Google could actually use this new anti-scraping solution to perfect its own ranking system and make sure that premium, original content occupies a privileged position in search engine results.

At the end of the day, content is web gold. Yours is truly powerful only if it’s 100% original and protected against unscrupulous web scrapers. Follow the basic steps listed above to discourage Internet copycats and invest your time, money and energy in first-class content creation that will automatically live up to your expectations.


avatar

Julia McCoy is a serial content marketer, entrepreneur, and bestselling author. She founded a multi-million dollar content agency, Express Writers, with nothing more than $75 at 19 years old. Today, her team has nearly 100 expert content creators on staff, and serves thousands of clients around the world. She's earned her way to the top 30 worldwide content marketers, and has a passion for sharing what she knows in her books and in her online course, The Content Strategy & Marketing Course. Julia also hosts The Write Podcast on iTunes.

css.php