Caution:Blog Scrapers

paws1There has recently been a lot talk about the importance of blogging for SEO purposes. Many experts agree that a blog is a dynamic and interactive tool which allows you to effectively address vital issues and share relevant and useful information with readers both inside and outside the industry. But as blogging increases in popularity, more websites are facing the alarming problem of content theft.
My first encounter with this issue happened a while ego when I posted my first blog on Data Feed Optimization. In a matter of a hours the word-by-word copy appeared on multiple sites. A few days later the original blog was completely knocked out of search results. What happened to me is not uncommon. Scrapers also known as content scrapers or feed scrapers are computer programs that go out onto the web and copy the content from other sites with unique content.

They usually do it by adding your RSS to their site. Once they get a notification that you published something they automatically scrape the new post. They almost never link back leaving the poor low-ranked blog behind to suffer lower rankings.
Another side of content theft is blunt plagiarism which I personally consider to be even more appalling. I was recently browsing the web looking for good blogs on the latest SEO trends and came across a great article by Stephan Spencer. This article was duplicated at least 7 times on the same page for search results. Most bloggers from the first page did refer to the original author except for one. Some guy Chris Wood who claims to be a reputable online marketing consultant copied the entire blog word-after-word, pasted in into his own blog using site-specific design and formatting, and took the credit for it. Such practices occur every single day!
One would think Google and other search engines are smart enough to detect cases of content theft but unfortunately, they are not. If your blog is scraped almost simultaneously without a backlink, is next to impossible for Google to decided who has the original copy and who is a copycat. This is where your page rank may come into play. Sadly, if your blog thief has a much higher page rank you may the one labeled as a copy cat. Thus blog thieves get away with the theft and thrive while the original website maybe penalized. Is that fair? Definitely not.
Here is a few things you can do to detect plagiarism and protect your website from automatic scrapers who make up for the majority of content theft.

1. Submit your url to Copyscape to see who is stealing your website or blog content.

2. If content thieves don’t show up immediately in the search results, try to repeat your search a few times with omitted results.

3. Set up Google  Alerts for key phrases related to your content.

4. Set the feeds to “summary” which will give the scrapers only an extract from the entire content as opposed to the entire post.

5. Add legal disclaimer in the header or the footer of the post.

6. Use tools such as Numly WordPress PluginRSS Footer, and TagRight to restrict the access to your content and establish your authorship.

7. Force blog scrapers to link back to your site by

a) including links which point to specific pages on your website in the blog copy

b) linking to your blog post from within the feed content.Read tips from Matt Cutt on how to do it.

c) using the free tool called Tynt Tracer (it inserts a short piece of code on your blog to prevent scraping without backlinks)

Now the bad news. There is little protection against the manual scraping. Combating this is a time consuming process which doesn’t always lead to the desirable outcome. If the entire text is copied without referencing to you, it is still legally enforceable under the copyright laws, however, if only parts of the content are stolen, your actions are very limited. The common suggestion is to put your craving for justice aside for a moment and assess the damage as to whether it is worth your time to go after the thieves. If the damage is tangible, you can

send them an e-mail and request to take the content down
flag them on the website
modify RSS feed for the offenders or even block your IP address
inform search engines through DCMA (Digital Millennium Copyright Act) letters
seek legal action ( in extreme cases)

Due to the efforts of our SEO team, the blog entry was back into the search results and the copycats were pushed far down. It was a good learning experience but it also made me think that blog scraping is a small drop in the ocean of plagiarism. Let’s face it. Individuals as well as companies bluntly rip off entire websites and business ideas every day. The least Google and other search engines can do is develop better ways of detecting and combating content theft.
P.S. If this blog gets scraped, wouldn’t that be awesomely ironic?

Popularity: 24% [?]

Leave a Reply


Have a question?
Call us at (212) 868-1421