The below mentioned image is an example of Content Duplication, where the content from a wikipedia site has been copied and added in the users site. Content duplication is highly unethical and is penalized by Google.
Duplicate content refers to contents within a website or across different websites that are substantially similar to the original copy of the content. There are number of reasons why a copy is similar. Some of them are really valid reasons but some are intended to SPAM the search engines. Duplicate contents occur in various ways.
1. If you have a commercial website and have 100s of products listed, you might have come across duplicate issues. When a page is generated it has to be queried from the database. The page generated will accidently be created with lot of duplicate versions. Reason is the URL structure, different URLs might query the same data from database. E.g.: When you have a catalog text version ( E.g.: X-cart search engine friendly URLS )and dynamic version, then 2 pages are generated, also if someone does a search or if there is a parameter added to the URL and if the same page is displayed, you will end up with 4 same version of the page. For searching URL, it should be linked somewhere on the website for the search engine to find it. Like this we can identify many reasons, why a website ends up with multiple duplicate pages. This reason is completely avoidable, either a good SEO expert or a web developer will be able to fix and make sure search engines see only one version.
2. You might have a printer friendly page, a PDF download file and an html file with same content; this is another unintentional duplication which can be avoided by preventing any 2 versions from being crawled by the search engine crawler.
3. Also some people tend to display different articles on site unintentionally on various pages; this will cause duplication and can be avoided.
4. Duplicate product description across different pages selling same products is a common but avoidable occurrence. This is very common because if a merchant want to sell the products of big manufactures, for example, electronics, they have to get the catalog from those companies and those companies will not give him any unique description, they will give the same catalog with description which they give to 100s of other merchants. So it's the responsibility of the retailer or reseller to write unique description for the products to avoid duplication.
5. Sometimes for branding purpose, a site owner will redirect multiple URLs to the same website. This is a big mistake, especially, if the domain names have links from other websites in internet. When a crawler finds those domains linked and all of them linking to the same websites they will not be able to sort out which is the original domain name and this will result in the site not being ranked at all. To make sure this does not happen, site owners can use 301 redirects of all extra domains, which tell the search engines that all the add-on domains are going to the main domain website and the change is permanent.
Like this we can imagine many valid reasons of content duplication but everything is same for the search engines. They don't care whether it's intentional or un-intentional, since they have billons of WebPages to crawl and everything is automated.
Potential Spam reasons for content duplication:
1.Many people think content duplication is un-detectable and they keep spamming the search engines not worried about getting penalized. One such method is the article reproduction of single article across 100s of websites. Imagine you write one article and reproduce it on 100s of different websites. That is like seeing the same movie on 100 different websites; search engines consider this as spam and write their algorithms this way. There are some SEO companies that promote a website by submitting one article to more than 100 article reproduction websites. They do this to gain backlinks from the footer of the article. Search engines are aware of this strategy and they care for only one to five versions of the articles and most of the other copies are discarded from ranking. Only a handful of article sites are really valid these days and search engines give credit for article published on those article sites only. Few valid sites publishing articles are Ezine, Article dashboard etc.
2. Duplicate directory content is another typical example of spamming. Before 3 to 5 years, there were 1000s of free directories accepting submissions on their websites. None of these directories had any good content and search engine algorithm mostly disregard these directories for duplication reasons.
3.Plagiarism: This is common in internet, someone who doesn't want to write contents, will just steal the content from other websites and post it as their own. Thieves are everywhere and content thieves are no exception. Duplicate pages will be made, if the content from a webpage is copied and posted on many different websites. For search engines it will be difficult to know which is the original copy especially if all the copies pop-up within days.
4. CCS SPAM (City/Country/State): This is one of the most popular but ridiculous duplicate content spamming you will ever come across. A site owner or SEO company will write just one content but with a database of all cities, countries and states in US they will display the same content just by changing the city name, country or state in the content. This will create 1000s of pages within 2 hours of coding and content writing work. It was aimed primarily to make the search engines look like the site has 1000s of pages. It was successful before but not anymore, since search engines are aware of it and they have their way of tackling this spam.
Similarly we have many duplicate content spam methods like empty pages, scrapping etc. Duplicate content for long has been the single most important factor search engines are facing to solve.
Images |
Screen Shots |
Survey |
The survey was conducted to determine the near-duplicate pages in a cluster format. The blue oval in the image depicts the largest clusters of near-duplicate pages. And the other clusters reflect genuine content replication.
Working Example |
In the below mentioned image, the third result from the bottom copies the content from the first result, which is the official drupal.org site. It even copies the keyword misspelling 'custom-error-handling'. It is a classic example of duplicating content from the top result in order to boost your own rank.
References |
Other sites that refer to the same manipulation tactic are as follows |
Search Engine Optimization SEO Company | Privacy Policy | Term of Service | Copyright
Search Engine Genie is an Ethical Search Engine Optimization Company Specializing in Search Engine Marketing, Search Engine Promotion and Search Engine Ranking Services.