Are you trying to find duplicate characters? I mean seriously?
if So, then almost all the characters will be duplicate because words are made of characters so characters will be repeated more than one time.
Finding Duplicate content make sense but searching for duplicate characters is senseless and a waste of time.
Short of exhaustively comparing all pairs of web pages, an infeasible task at the scale of billions of pages, how can we detect and filter out such near duplicates? We now describe a solution to the problem of detecting near-duplicate web pages. The answer lies in a technique known as shingling .
Basically I am using siteliner for this, but there also I become confuse to find duplicate in better manner. If anyone good way check plz reply on my thread.
Personally, I used copyscape.com to find duplicate content on the website. It is good. There you can find specific duplicate content links easily. It is free tool.