发表于 : 2007-11-08 11:39
直接都cp到一个目录。会提示你要覆盖的不。呵呵。笨。
代码: 全选
du -ab *|sort
主要是里面的内容一样的,名字不同的eexpress 写了:直接都cp到一个目录。会提示你要覆盖的不。呵呵。笨。
相同md5的,还有diff的必要吗??BigSnake.NET 写了:先统计文件大小然后对有多个相同大小的做一次md5sum,再排序分类代码: 全选
du -ab *|sort
然后对相同md5的再逐个diff
会死人的哦eexpress 写了:简单而龌龊的方法要不。
一个for循环,把目录里面的文件全部加上.jpg后缀。开那 gthumb / gqview,使用只搜索同名文件的功能(不比较内容)。就可以找到文件。
你这要求特殊了,没写shell的好处。写程序,要建立列表。比较文件名,再比较内容。复杂。没边。
Dupseek finds and interactively removes duplicate files. It aims at maximum efficiency by keeping file reads to a minimum and is much better than other similar programs when dealing with groups of large files of the same size.
FDUPES is a program for identifying or deleting duplicate files residing within specified directories.
DupeFinder is a simple application for locating, moving, renaming, and deleting duplicate files in a directory structure. It's perfect both for users who haven't kept their hard drives very well organized and need to do some cleaning to free space, and for users who like to keep lots of backup copies of important data "just in case" something bad should happen.
weedit is a file duplicate scanner with database support. It uses CRC32, MD5, and file size to scan for duplicates. Files that are deleted are automatically removed from the database when a duplicate is found. It will only rescan files if the creation time or last write time change. It will only delete duplicated files if the parameter for deleting is used. The default setting is to report only.
whatpix is a Perl console application which finds (and optionally moves or deletes) duplicate files.
dupliFinder is a graphical tool that searches directories on your computer for duplicate files by checking and comparing the MD5 sum of each file. This means that the contents of the file are examined, not the filename. You then have the option of reviewing the duplicate files and then deleting them. It's great for finding duplicates in your MP3, image, or movie collections.
代码: 全选
#!/usr/bin/env python
#coding=utf-8
import binascii,os
filesizes={}
samefiles=[]
def filesize(path):
if os.path.isdir(path):
files=os.listdir(path)
for file in files:
filesize(path+"/"+file)
else:
size=os.path.getsize(path)
if not filesizes.has_key(size):
filesizes[size]=[]
filesizes[size].append(path)
def filecrc(files):
filecrcs={}
for file in files:
f=open(file,'r')
crc = binascii.crc32(f.read())
f.close()
if not filecrcs.has_key(crc):
filecrcs[crc]=[]
filecrcs[crc].append(file)
for filecrclist in filecrcs.values():
if len(filecrclist)>1:
samefiles.append(filecrclist)
if __name__ == "__main__":
filesize("/home/oneleaf/test/")
for sizesamefilelist in filesizes.values():
if len(sizesamefilelist)>1:
filecrc(sizesamefilelist)
for samefile in samefiles:
print "******* same files group **********"
for file in samefile:
print file