Linux offers an unsurpassed breadth of open source small utilities that perform functions ranging from the mundane to the wonderful.
You often hear that disk space is cheap and plentiful. And it’s true that a 4TB mechanical hard disk drive currently retails for around 90 dollars. But like many users we have moved over to M.2 solid-state drives (SSD) and as our primary storage. SSD does functionally everything a hard drive does, but helps to make a computer feel more responsive. With a SSD, data is stored on interconnected flash memory chips that retain the data even when there’s no power present. SSDs are more expensive than mechanical hard drives in terms of dollar per gigabyte. And SSDs with high capacities are thin on the ground and expensive, so most users settle for lower capacity SSDs.
There’s lots of software that helps you find duplicate files. We covered the best programs in our Reclaiming Disk Space article. But many of the programs aren’t designed to find duplicate or near-duplicate images. Step forward Image Deduplicator.
Image Deduplicator is a Python package that simplifies the task of finding exact and near duplicates in an image collection. It uses a variety of algorithms to detect matches.
There are a few ways of installing this program. The project recommends installing the software using pip, a general-purpose package installer for both libraries and apps with no environment isolation. However, this may not be ideal depending on the distribution you are running. Instead, we recommend using pipx, which creates an isolated environment for each application and its associated packages. By default, pipx uses the same package index as pip.
Alternatively, you can install the software by cloning the project’s GitHub code repository, and install using the project’s setup.py script.
$ git clone https://github.com/idealo/imagededup.git
$ cd imagededup
$ pip install "cython>=0.29"
$ python setup.py install
Your distro may provide a convenient package. For example, for Arch-based distros, there’s a convenient package in the Arch User Repository.