Duplicate content is one of the factors that can have the most negative impact on Google rankings. Find out how to avoid this by following a few simple steps.
In one of our last posts we talked about the factors that could potentially put us in the crosshairs of Panda 4.0. And how could it be otherwise, duplicate content was one of them, so we published this little guide.
Some still think that duplicate content is limited to a matter of plagiarism, copyright infringement, etc. But no. It’s much more complex than that! As
SEO I have come across websites made with WordPress, to give an emblematic example, with a well worked domain, a brand new paid theme, a blog embedded in the web, quality content, social profiles, and a loading speed higher than that of a F18. Not to mention the online stores with thousands of hours of work invested and that, despite this, hardly anyone knows about. “Why can’t we position ourselves in Google?”, their owners often ask me, with some anguish. And it turns out that, with a quick analysis, it is obvious that the website in question is oozing with duplicate content. “All the work thrown away?” No! There is a solution. I love to give good news 🙂
Throughout this post we will see in which situations duplicate content can appear and how we can avoid it on each occasion. Here we go!
Índice de contenidos
What is duplicate content?
It is simply text that appears repeated on more than one page, either within your own domain, or on several different domains. In other words, if someone plagiarizes content from your website, you have a duplicate content problem; but if you have texts that can be accessed from different URLs on your own website, you have a duplicate content problem too! The fundamental issue here is that it is very easy to duplicate content without realizing it. In the previous example I have referred to websites made with WordPress, a CMS that captivates many entrepreneurs because it seems to give us everything done. But be careful! A WordPress in inexperienced hands can end up sabotaging a project. Precisely because it duplicates content very easily. Why is Google pursuing it? We could go on for a long time, but basically there are two reasons:
- If repeated content appears for the same search, the user experience worsens.
- When faced with duplicate content, Google does not know which one to index.
Actually, (2) is a consequence of (1). Since to provide a good user experience, it is preferable to avoid duplicate content in searches, Google applies filters to select one of the versions of the repeated text. And since their page crawling system is not perfect, indexing errors occur. How duplicity of content can affect me Well, negatively. That’s obvious 🙂 Let’s take a closer look:
- Loss of control over your indexed pages: Google decides which version of the duplicate content to index, and is very likely to choose the option you are least interested in. In addition, if the robot allocates too many resources to “read” your pages with repeated content, it may ignore other pages that you do want it to crawl.
- Worse search engine ranking: If Google chooses, it may display the page less search engine friendly, and therefore, you will have to settle for a lower position in the SERP’s.
- Dispersion of authority: if you have two versions of your website or some of its pages, you are splitting the Link Juice of the natural links you can get. If you are familiar with concepts like Page Rank, Domain Authority and Page Authority, and you know how they work, I am not revealing any secrets. This translates into a worse SEO positioning.
- Penalization for plagiarism: a surreal situation often occurs; someone plagiarizes content from you, and Google “decides” that their version is the original, and yours is a mere copy. You could be hit with a penalty, and that content could be relegated to the tail end of searches. Unbelievable, but true.
How does duplicate content occur and what can we do to avoid it?
Good question! The moment of truth has arrived. Let’s take a look at the main reasons why content is duplicated. Take note, or add this post to “bookmarks” in your browser, because this is of interest to you 😉 Domain in two versions, with and without www Your website can appear with www in the domain, or without www. If you can access both ways, you have a duplicate site! And you know what this implies…. To solve the problem you have two possibilities: 1. Sign up for free in Google Webmaster Tools (if you have not already done so), and indicate the site of preference. You can also perform a redirect from the server, but this option is “more complex”. Category or tag pages Neglecting this issue is a widespread mistake. CMSs often produce specific pages for each category or tag, which duplicate the contents of the pages to which they belong. Parameters in the URL Parameters are symbols that are added to the end of the URL (variables), but do not change the content of the page, duplicating the contents. They may be due to several causes:
This is often the case with online stores, another potentially inexhaustible source of duplicate content, due to the large number of filters that are applied to locate products. Often, the CMS adds dynamic parameters. Imagine you have an online garden furniture store, and you have a table in various colors. With an almost identical description, you might come across several such URLs: http://ejemplo.com/mesas?color=blanco http://ejemplo.com/mesas?color=verde http://ejemplo.com/mesas?color=azul This is all duplicate content.
When users include their session ID, duplicate pages with the same content may appear, where the only thing that changes is the session identifier, which appears as a parameter at the end of the URL(sessionid=1357, for example). Problems with duplicate content due to parameters can be solved in two ways:
- With the rel=canonical tag. Although many people are confused by this label, introducing it is very easy. Simply embed this line of code:
<link rel=”canonical” href=”http://ejemplo.com/urlcanonica.html”/> Remember that the canonical tag must always be inserted in the web that you do not want to use as “canonical”, and do not forget to insert it between the tags <head> y </head>!
- Enter Google Webmaster Tools, and in the “URL Parameters” section, you will be able to indicate which parameters the search engine should ignore.
Mobile web versions Mobile web pages are often an exact replica of the desktop version. To avoid duplicate content we can do several things:
- Create a specific website, with different contents adapted to mobile devices.
- Use a responsive
that adapts the way in which your website is displayed depending on the characteristics of the device from which users access it.
- The above two points should be put into practice before the problem of duplicate content arises. If this is not possible, it can be solved with the tag
tag to indicate that the desktop version is the good one.
Pagination The same applies to pagination. Often, even the information in the meta tags does not change, except for the parameters marking the page number at the end of the URL. To solve this problem you have the rel=next and rel=prev tags, which indicate to search engines that all pages, as well as the content of the title, description, etc. tags, belong to the main page of the series. If you don’t know how to enter this tag, the Google Webmaster Blog can help guide you. HTTPS protocol Secure browsing of a web page using SSL can be a source of duplicate content. As you know, the URL is the same, except for an -s that will appear at the end of http. Its content is also, obviously, the same. To solve the problem, the rel=canonical tag can be of great help again. Content migration When the contents of a web page are migrated to get friendlier URLs, or to move them from one site to another, for some time, the old version can coexist with the new one. The solution? 301 redirects. They are added to the end of the .htaccess document. Just add the following code for each page or directory you want to redirect: Redirect 301 /old-page http://ejemplo.com/pagina-nueva It’s very easy! But don’t forget to save a copy of .htaccess to the clipboard first. Any change could wreak real havoc on your site.
Duplicity of content due to plagiarism
It is true that since Google started its particular battle against duplicate content, plagiarism has been drastically reduced. If you plagiarize, sooner or later you will suffer the consequences, and you will notice it, losing positions in search results. However, there are still a few clueless people out there. There are several types of plagiarism:
Plagiarism on its face
As we said at the beginning, when Google encounters the same content in two different places, it must decide which is the original source. In the process of discerning between the original and the copy, the Google bot often screws up. Something as surreal as the following can then happen that the website where you have been plagiarized is considered as the original content provider, and yours, as the place where the content created by someone else has been copied. What a trick, isn’t it? If this is the case, you have two options:
- Politely ask the person who plagiarized you to remove the copy. If he refuses, you can negotiate with him, and suggest that he at least insert a link to the page where you published the content, so that Google can distinguish the copy from the original. But this solution is not always effective.
- Send a formal request to Google to remove the plagiarized page from search results under copyright.
The best thing to do is to try to resolve the conflict peacefully first, and if not, report it!
Scraping consists of copying a website as is, with software tools created specifically for this purpose. All your content could end up published on another domain. Google Panda has set out to put an end to Scraping. But this is still happening, and it is important to detect it quickly. In case someone is Scraping your content, informing Google could, again, be the solution.
There are very foolish ways of publishing repeated content, in which many webmasters and bloggers tend to incur. You have a blog, and you decide to entrust the publications to one or more editors, and you get plagiarized content. Or you are a blogger who is starting to become influential in a certain niche, and they propose you to publish a sponsored post, an advertorial or a press release. You gladly agree, and you get a few eurillos or a few dollars. But in time you discover that several bloggers published the exact same sponsored post. In all these cases, you have not committed plagiarism, but there is copied or repeated content on your site. Content detection tools like Copyscape can get you out of trouble.
For duplicate content on other sites, they use Google Webmaster Tools. It will help you detect problems and provide you with various means to solve them. Customize the meta tags(title and description) for each of the pages of your domain. If you quote from another website, always enter a link. And be original! Apart from search engine penalties, creating genuine content will help you differentiate yourself from others, build your brand image and position yourself as an expert in your industry. If you found this post useful, please share it on social networks 🙂 . And if you have anything to add, use the comments!