You may find yourself overwhelmed by files and in the need to keep the filesystem organized. If deleting is the best option, you may consider these 2 options:
From http://code.google.com/p/hardlinkpy/ , "hardlink.py is a tool to hardlink together identical files in order to save space.". Thus the filesystem is the same, but duplicate files are checked so they are actually written once on the hard drive.
Dupinator, tries to find duplicates and to report them in order to clean-up the organization of your files.
dupinator 2 : version 2 : http://www.shearersoftware.com/personal/weblog/2005/01/14/dupinator-ii
dupinator 1 : The latest version can be found at http://svn.red-bean.com/bbum/trunk/hacques/dupinator.py. It is a one-off that solved a problem, not an attempt to write the world's best python script. http://www.pycs.net/bbum/2004/12/29/
It works by:
- launched via command line by passing a set of directories to be scanned
- traverses all directories and groups all files by size
- scans all sets of files of one size and checksums (md5) the first 1024 bytes
- for all files that have the same checksum for the first 1024 bytes, checksums the whole file and collects together all real duplicates
- deletes all duplicates of any one file, leaving the first encountered file as the one remaining copy