Find Duplicates

rdfind – redundant data find

Rdfind is a program that finds duplicate files. It is useful for compressing backup directories or just finding duplicate files. It compares files based on their content, not on their file names. It calculates checksum only if necessary.

Rdfind uses the following algorithm. If N is the number of files to search through, the effort required is in worst case O(Nlog(N)). Because it sorts files on inodes prior to disk reading, it is quite fast. It also only reads from disk when it is needed.

Given two or more equal files, the one with the highest rank is selected to be the original and the rest are duplicates.

This program is free and open source.

The software uses the following algorithm.

  1. Loop over each argument on the command line. Assign each argument a priority number, in increasing order.
  2. For each argument, list the directory contents recursively and assign it to the file list. Assign a directory depth number, starting at 0 for every argument.
  3. If the input argument is a file, add it to the file list.
  4. Loop over the list, and find out the sizes of all files.
  5. If flag -removeidentinode true: Remove items from the list which already are added, based on the combination of inode and device number. A group of files that are hardlinked to the same file are collapsed to one entry. Also see the comment on hardlinks under ”caveats below”!
  6. Sort files on size. Remove files from the list, which have unique sizes.
  7. Sort on device and inode(speeds up file reading). Read a few bytes from the beginning of each file (first bytes).
  8. Remove files from list that have the same size but different first bytes.
  9. Sort on device and inode(speeds up file reading). Read a few bytes from the end of each file (last bytes).
  10. Remove files from list that have the same size but different last bytes.
  11. Sort on device and inode(speeds up file reading). Perform a checksum calculation for each file.
  12. Only keep files on the list with the same size and checksum. These are duplicates.
  13. Sort list on size, priority number, and depth. The first file for every set of duplicates is considered to be the original.
  14. If flag ”-makeresultsfile true”, then print results file (default).
  15. If flag ”-deleteduplicates true”, then delete (unlink) duplicate files. Exit.
  16. If flag ”-makesymlinks true”, then replace duplicates with a symbolic link to the original. Exit.
  17. If flag ”-makehardlinks true”, then replace duplicates with a hard link to the original. Exit.

Website: rdfind.pauldreik.se
Support: GitHub Code Repository
Developer: Paul Dreik
License: GNU General Public Licence version 2 or, at your option, a later version

Rdfind

rdfind is written in C++. Learn C++ with our recommended free books and free tutorials.


Related Software

Find and Delete Duplicate Files with these CLI Tools
CzkawkaFind duplicate files, big files, empty files, similar images, and much more
fdupesGreat CLI tool that's written in C
fclonesEfficient duplicate file finder and remover
rmlintFast tool to remove duplicates and other lint
jdupesPowerful CLI duplicate file finder and 'enhanced' fork of fdupes
smashFind duplicate files super fast
rdfindCLI redundant data find tool written in C++
duffCommand-line utility for finding duplicate files
rmdupesOption to use a reference directory
PeriscopeOrganize storage and safely remove redundant files
Go Find DuplicatesScans directories for duplicate files and directories
samanlainenDelete duplicate files with SHA512 hashing
FSlintPython based CLI and GUI tool
sdupesFast duplicate file detection utility.
dupefiDuplicate file finder designed with Linux philosophy
DupsterDuplicate file finder
dupleFind and remove duplicate files
ddhDirectory Differential hTool
backdownSafely and ergonomically remove duplicate files

Read our verdict in the software roundup.


Best Free and Open Source Software Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments