We have already talked about how to avoid duplicate content but this time Jonathan Valenzuela explains how to remove duplicate content from google results.
As with everything, you know that duplicate content or pages with poorly qualified content is one of the main targets of the Panda algorithm, designed to achieve higher quality sites in Google’s results pages.
The bad thing about this duplicate content is that we do not get any manual penalty or any warning in Search Console, simply our website does not get to position as well as it should, which is known as an algorithmic penalty.
Índice de contenidos
How can I know if I have duplicate content?
The easiest way to know if you have duplicate content is to perform a search in google with the command site:miurl for example site:rodanet.com in this way all the results that Google will offer us, will be urls of our website indexed.
What is considered duplicate content?
Basically they are all those pages that list content that already has its own url or has been listed before, a good source of duplicate content in wordpress are usually tag pages, archive pages or author pages as they list the posts that have previously been listed in the category, or show excerpts of the entries that have already been indexed in the posts.
These pages do not have enough unique content and are considered thin content, or irrelevant content.
You also have to take into account the fact of not having indexed pages without content or with only 2 or 3 lines. These can also be sanctioned by panda.
Have you installed a demo version of your theme? You will probably have a lot of duplicate content, since you will generate exactly the same posts with the same content.
Eliminating the duplicate
The easiest way to get rid of this duplicate content is to adjust the meta robots of these pages, it is recommended that pages such as tags or wordpress authors are assigned a noindex, follow.
We can also adjust our robots.txt settings to prevent the google crawler from wasting time, even if we want 100% optimization we should prevent these urls from appearing in our sitemap.
However, if the urls have already been indexed, the fastest way to remove them is by means of a plugin that will make our life easier.
This is an extension for Chrome, which will help us to automatically upload all urls to Search Console and remove us from it in a fast way, for this we need:
- Collect all urls into a .txt file For this you can do it manually, choosing the url directly from the SERPs and pasting it or, what I particularly like the most, although it is more advanced, use the footprint in scrapebox and export the list of urls in .txt.
- Install the bullk url extension. To do this we download the extension from the Github page: https://github.com/noitcudni/google-webmaster-tools-bulk-url-removal
Unzip the zip file, and upload the folder in the chrome extensions:
It is important to activate the developer mode, and then load the unzipped folder of the extension.
3. Now we can go to our Search Console account and we can see that in Google index section > url removal
We can see how now, we do not have to put the url one by one, the plugin has created a field where we can upload our .txt file with all the urls and it will process them automatically.
Now the job is in Google’s hands and you will see that after 10 minutes, those annoying duplicate content urls will not appear in Google’s index.
You no longer have an excuse for not removing duplicate content urls from Google!