When it comes to content duplication, you'll often hear about the term 'rel=canonical'. This is often used as a SEO technique to avoid penalty, because search engines will penalize you if you have duplicate content on multiple pages. This happens when you deliver archive pages, data sorted according to the user's preference, and so on. Rel=canonical is used to overcome such problems, and helps consolidate all different versions of the same page. It tells search engines what URL to consider and display in search results. Useful as it is, rel=canonical can be a bit tricky to implement properly.
Here are some common mistakes regarding rel=canonical, and tips to avoid them.
1. Using rel=canonical in pagination
Most blogs use pagination to break up their content chronologically. For example, you might have a page with a URL www.yoursite.com/page/2, and another page with a URL www.yoursite.com/page/3, and subsequent pages numbered accordingly. Some people canonicalize the additional pages to point to the starting page. This isn't a good idea. There is no duplication actually taking place. So canonicalizing in this case would lead to your pages beyond your first page to not be indexed at all.
Solution: Either canonicalize from component pages to the pages with full content in them, or use the rel="prev" and rel="next" pagination markup.
2. Absolute vs Relative URLs
A lot of people make the mistake of using relative URLs when they should be using relative URLs. Relative URLs are relative to the current location. For eg, resources/resource1.php means 'find the resources directory in the current directory, and then go to resource1.php file. In contrast, absolute URLs have the full path, for eg www.yoursite.com/resources/resource1.php. If you're working in the same folder, then that's not a problem. But using relative URLs is a very generic approach, and will not work once you change your cirectory. And this will give you canonicalization errors.
3. Rel=canonical should be used in the <head>
To avoid parsing issues, rel=canonical should be used as early as possible in the <head> section of a webpage. If you plce it in the body, it will simply be disregarded.
4. Category pages
Most blogs have category pages, such as SEO, Social Media (for this blog) or Tech Reviews, News, Gadgets (for a technology blog). Often times, some posts are featured regularly on such pages. In such cases, there is some content duplication going on. Here, you will add a rel=canonical from the category page to that featured post's direct URL (i.e. where you can see the full article).
5. Multiple rel=canonical declarations
Sometimes, people just tend to be careless with their canonicalization. Often times, it happens that when someone uses a template from another site, they forget to notice that there was a rel=canonical to the template site. Sometimes, people install multiple SEO plugins, that cause conflicts and add multiple rel=canonicals. Either case will cause problems. The best solution would be to check the source code of your site, and make sure that there are no errors.
So, are you having problems with canonicalizing? I hope this small guide helped you out. If it didn't, please feel free to ask questions in our comments section. Peace :)