差异文件夹递归与多线程

差异文件夹递归与多线程

本文介绍了差异文件夹递归与多线程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要比较两个目录结构,每个目录结构大约有十亿个文件(目录深度可达20个级别)

I need to compare two directory structures with around one billion files each (directory deepness up to 20 levels)

我发现通常的diff -r /location/one /location/two速度慢.

多线程diff是否有实现?还是通过将shelldiff组合在一起可以实现?如果可以,怎么办?

Is there any implementation of multithreading diff? Or is it doable via combining shell and diff together? If so, how?

推荐答案

您的磁盘将成为瓶颈.

除非您正在使用tmpfs,否则您可能只会放慢速度.那说:

Unless you are working on tmpfs, you will probably only loose speed. That said:

find -maxdepth 1 -type d -print0 |
    xargs -0P4 -n1 -iDIRNAME diff -EwburqN "DIRNAME/" "/tmp/othertree/DIRNAME/"

应该在比较树上做得相当不错(在本例中为./tmp/othertree).

should do a pretty decent job of comparing trees (in this case . to /tmp/othertree).

它现在有一个缺陷,因为它无法检测到.中不存在的otherthree中的顶级目录. 我将其留给读者练习-尽管您可以轻松地反向进行比较

It has a flaw right now, in that it won't detect toplevel directories in otherthree that don't exist in .. I leave that as an exercise for the reader - though you could easily repeat the comparison in reverse

xargs的参数-P4指定您最多需要4个并发进程.

The argument -P4 to xargs specifies that you want at most 4 concurrent processes.

也可以查看 xjobs实用性 在分离输出方面做得更好.我认为使用GNU xargs(如图所示)不能删除-q选项,因为它会混合diff(?).

Also have look at the xjobs utitlity which does a better job at separating the output. I think with GNU xargs (like shown) you cannot drop the -q option because it will intermix the diffs (?).

这篇关于差异文件夹递归与多线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-22 15:45