Skip links

Duplicate Content

Definition

Duplicate content is content that is found twice on the internet. It is content that is very similar, or even identical, on several pages with different URLs. These pages can be from the same site as well as from different sites. Furthermore, content is considered duplicated when a single paragraph on the page is similar to a paragraph on another page. This is why it is important to be very vigilant in your content writing in order to avoid any duplication.

Contenu dupliqué

The two types of duplicate content

Duplicate content internally, or intra-site

As its name suggests, internal duplicate content, or intra-site, corresponds to the presence of duplicate content on several pages of the same site. Even if it is generally involuntary, duplicate content is not recommended at the risk of being penalized by search engines and impacting its reputation.

The great danger of duplicate content within a site is that it is very easy to fall into it. Let’s take the example of a site listing various Escape Games in Paris. It is then tempting for this site to write a single paragraph explaining what an Escape Game is, and to integrate this paragraph into each of the pages dedicated to one of these Escape Games. However, doing so would be a big handicap for the SEO of your pages, and is therefore strongly discouraged.

Intra-site duplicate content is a problem that affects e-commerce sites in particular because of faceted searches. Indeed, most e-commerce sites, in order to facilitate user navigation, offer different filters to be applied to the pages to refine a search. This type of search certainly improves the user experience, but it also leads to the creation of many pages with very similar or even identical content, and thus the presence of duplicate content on the site.

As a general rule, your site is likely to have duplicate content for various reasons:

  • When your CMS allows you to create the same page twice, once in desktop version and once in mobile version, but with different URLs
  • When you recreate a page on your site, forgetting to redirect the old one to this new one
  • When you have several domain names for a single site
  • When you redesign your website without telling Google what content it should index using “rel=cannonical” tags

Duplicate Content Externally, or cross-site

Duplicate external content, or cross-site content, corresponds to the presence of duplicate content on several different sites. This type of duplicated content is much more penalised by Google than internal content. Moreover, it also raises legal problems, with the question of copyright. It is therefore important to remain particularly vigilant regarding this inter-site duplicate content in order to avoid at all costs that your site has it.

It is common to find very similar content on different e-commerce sites, especially in the “product description” section. In fact, different sites can sell products from the same supplier, and therefore have the same product characteristics. It is enough for several of these sites to simply repeat the description written on the label of their product for duplicate content to appear between sites.

Generally, Google will only index one of these identical contents, which it will choose according to the popularity or age of the pages. This content determined by Google to be the original is called the “canonical content”.

What are the dangers of duplicate content in SEO ?

Duplicate content is both detrimental to the user and to Google. The user experience is weakened because of identical information on several search results. From Google’s point of view, duplicate content has a negative impact on a website’s SEO and traffic.

First of all, it is risky for your SEO to have duplicate content within your site. Indeed, if your site has several pages with very similar or identical content, they may compete with each other. This is called keyword cannibalisation. Your pages, by offering very similar content, are positioned on the same keywords, and therefore compete with each other in terms of their positioning on these keywords. They will thus penalize each other by preventing each other from positioning on the keywords targeted by their content.

In addition, offering two pages with duplicate content also uses your crawl budget and Google will waste time crawling identical pages instead of crawling your site for strategic content.

Finally, duplicate external content is also heavily penalised by search engines. For example, Google has developed an algorithm to identify duplicate and stolen content: Google Panda. This algorithm will come and crawl the pages of your site to check the quality and authenticity of their content. If there is too much duplicate content, you may be penalised, ranging from a drop in your position in the search results to total de-indexation of your site.

Multilingual sites and duplicate content: what about them?

Contrary to what one might think, translating your site into different languages does not create duplicate content. Indeed, Google is not able to determine whether a page is a translation of another. Furthermore, the reason why duplicate content is penalised by Google is that it is deemed to be of little relevance to users because it offers the same answer as another site to a search. However, two pages with the same content, but in two different languages, respond to different queries made in different geographical areas.

Our solutions to avoid duplicate content internally

As with any problem, there is nothing better than to tackle duplicate content internally at the source to prevent your site from having it. Here are some tips on how to avoid this problem.

Have only one URL for each piece of content

One of the first priorities is to make sure that each piece of content has only one URL. Indeed, if your content, because of the application of URL parameters, IDs, or any other element allowing you to modify a URL, ends up on several pages with different URLs, then Google will consider it as intra-site duplicate content. Here are some tips to avoid having multiple URLs for the same content:

  • Limit or ban the use of session IDs in your URLs
  • Limit or ban the use of URL parameters (especially in e-commerce with faceted searches)
  • Be careful to build all your URLs either around the www sub-domain, or around the domain only
  • Be careful to use only one protocol: http or https

Avoid copy/paste

As you can imagine, it is imperative to avoid copying and pasting from one page of your site to another. Always try to offer unique content, even if the themes covered by your pages are similar.

Canonical tags: canonicalisation of URLs

To avoid creating duplicate content on a site, you can use the canonical tag, which allows you to indicate the reference page of a content. If you add a canonical tag to one of your pages, you will send the following message to Google: “This page has content that is very similar to another page on the site. It is therefore preferable to index the referring page rather than this one.” In practical terms, it allows you to designate a page as “master” on your site, and to redirect Google to this page when it is on one of the pages with similar content. The canonical tag should be placed in the head section of the master page and pages with similar content.

Non-indexation of certain content

If the duplicate content is on pages that are considered non-quality or non-strategic, you have the option of not indexing them. All you have to do is place the “noindex” tag on your pages to tell Google not to index them, but still crawl them. It looks like this: < meta name = “robot” content = “Noindex, Follow” >. However, this method is not recommended if you want to have a qualitative site on all your pages.

Our solutions to avoid duplicate content externally

As with internal duplicate content, there are various solutions that will allow you to combat external duplicate content.

Do not use the same content on different network sites

Even though it may be tempting to use the same content on several of your network sites with similar themes, this will only have negative effects on your SEO. In fact, by doing so, Google may judge the content of your sites to be of poor quality, and therefore penalise them in their ranking.

Be careful when redesigning or migrating your site

Website redesigns and migrations are very tricky in terms of duplicate content. Indeed, when one changes domain, or redesigns one’s website, they find themselves making redirections in all directions, towards the pages of the new site whose architecture may have been changed compared to the old version. Forgetting to redirect is therefore common, and thus causes duplicate content to appear.

Request the removal or de-indexing of pages with similar content to yours

It is possible that some of your competitors are taking content from your pages and integrating it into their site. If you notice such a practice, you can enforce your copyright on your texts and images, and ask the webmaster to remove or de-index these pages with content plagiarized from yours. To do this, you just need to send an e-mail or message using the contact information generally given on the site. If you do not receive a response to your request, you can use Google’s complaint tool to force the removal of this content.

5 tools to detect duplicate content

As part of your SEO strategy, it is important to regularly check your site for quality content and the absence of duplicate content. Duplicate content can easily be detected on a small site without the use of technical SEO tools. But when your site has many pages, it is more complicated to identify this content. To make it easier for you, here is a non-exhaustive list of different SEO tools you can use to detect the presence or absence of duplicate content on your site.

Screaming Frog

Screaming Frog is an SEO tool that allows you to crawl your website just like Google does. This tool gives you various information about your website, including whether or not there is duplicate content internally. Screaming Frog also tells you if your site has duplicate page titles and metadata.

Duplichecker

Duplichecker is a platform on which you just have to enter the content of one of your pages in the search field, then click on “check plagiarism” so that it analyses the percentage of duplicate content present in your content. You can also directly enter the URL of your site, or upload the document on which your content is located if it is longer than 1000 words. Finally, Duplichecker also allows you to correct any grammatical errors that may have crept into your content.

Kill Duplicate

Kill Duplicate is the ideal tool to spot content thieves. Simply register your site with the platform and it will identify sites that would steal your content. Kill Duplicate offers different packages, allowing you to deal with sites of different sizes

Siteliner

Siteliner is a tool that allows you to determine the presence or absence of duplicate content on your site. To do this, enter your site’s URL in the search bar, and let Siteliner crawl your site. In addition, the tool also allows you to detect the presence of broken links and other technical information on your site.

Copyscape

The Copyscape platform not only identifies your potential content thieves, but also allows you to check whether your content is unique. The paid version of Copyscape allows you to be notified in real time if your site has been plagiarised.

Boost your Visibility

Do not hesitate to contact us for a free personalised quote

4.7/5 - (13 votes)