在UNIX中删除相同的文件

本文介绍了在UNIX中删除相同的文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理大约10MB的大量（30,000）个文件。其中一些（我估计2％）实际上是重复的，我需要保留每个重复的对（或三元组）的副本。
你会建议我一个有效的方式吗？我正在使用unix。

谢谢： - ）

解决方案

p>找到可能的重复文件：

  find DIR-type f -exec sha1sum{}\; |排序| uniq -d -w40

现在您可以使用 cmp 检查文件是否完全相同。

I'm dealing with a large amount (30,000) files of about 10MB in size. Some of them (I estimate 2%) are actually duplicated, and I need to keep only a copy for every duplicated pair (or triplet).Would you suggest me an efficient way to do that? I'm working on unix.

Thank you :-)

解决方案

Find possible duplicate files:

find DIR -type f -exec sha1sum "{}" \; | sort | uniq -d -w40

Now you can use cmp to check that the files are really identical.

这篇关于在UNIX中删除相同的文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！