Posted by: kurtsh | August 29, 2010

TOOL: “Similarity” – Identifying duplicate music files by comparing waveforms (3rd party)

imageUPDATE 9/17/12:
Well, the changed their domain name for some reason, but here 2 years later, the program is still available and in v 1.7.1 (back in 2010 it was 1.3.7).  There’s now a paid version of the app available that does very precise comparisons and adds some other functionality.  I bought the paid version.  It’s useful software.

“Similarity” is an interesting tool for Windows that I heard about on Paul Thurrott’s podcast.

It compares all the music files in your library to each other in a search to find duplicates.  What makes it special however is that it doesn’t compare tags or titles or the metadata.  Instead it looks at the wave form and compares it to other waveforms it finds.

This effectively finds songs that may be labeled differently but otherwise are the same music file as another.  It more importantly, can distinguish between different file formats (WMA, MP3, OGG, etc.) and different recording resolutions (128kbps vs 192kbps, etc.)

One thing I noticed is that it only appears to compare the first part of the music file.  For example, I have some music that is identical to each other in the first 20 seconds but change later in the file – it miscatagorized these and identified them as all duplicates.

Also, avoid the temptation to simply “install & run” the tool without any configuration.  There is a “network database” that the tool depends on for comparing file signatures.  It requires that you create a network login for the tool to use the database over the Internet.  Without this feature, the tool isn’t nearly as accurate as one would like.

It’s free – so why not give it a try?  It discovered 219duplicates 20% into my very first run, representing about half a gigabyte of duplicate music.


%d bloggers like this: