[thelist] Image duplicate filtering

Jack Timmons jorachim at gmail.com
Tue Feb 24 08:14:57 CST 2009


On Tue, Feb 24, 2009 at 5:05 AM, L. Mohan Arun (marun2 at gmail.com) <
marun2 at gmail.com> wrote:

> Hi All,
>
> I have 1000's of JPEG files dumped in a folder. Some JPEGs are the
> same, their filesize is the same, the content is the same, but the
> names differ.
>
> Something like Marketa-1, Marketa-2, etc. they have different name but
> the content is the same, file size is the same. Is there a tool that
> will let me remove all JPEG duplicates in a folder retaining only the
> unique ones?
>

I'm going to hazard a guess and say this is impossible, or at least a bad
idea. PHP won't be able to say if one image is a duplicate of another
(unless you do pixel by pixel comparison, good luck on that). Also, you
can't compare image sizes, because I'd wager there's a 99% change that if
they are photos, they're all the same dimensions, and more than likely the
same size. You know they're duplicates because you can take the whole thing
in at once.

Now that I think about it, you might be able to do a MD5 hash of all of your
photos: If one MD5 matches another, chances are it's a duplicate. That's the
best I can come up with off of a single cup of coffee.

-- 
-Jack Timmons
http://www.trotlc.com
Twitter: @jorachim



More information about the thelist mailing list