我有两个文件,每个文件都有数万行,output1.txt 和 output2.txt。我想遍历两个文件并返回两者之间不同的行的行(和内容)。它们大多相同,这就是为什么我找不到差异的原因(filecmp.cmp 返回 false)。

最佳答案

你可以这样做:

import difflib, sys

tl=100000    # large number of lines

# create two test files (Unix directories...)

with open('/tmp/f1.txt','w') as f:
    for x in range(tl):
        f.write('line {}\n'.format(x))

with open('/tmp/f2.txt','w') as f:
    for x in range(tl+10):   # add 10 lines
        if x in (500,505,1000,tl-2):
            continue         # skip these lines
        f.write('line {}\n'.format(x))

with open('/tmp/f1.txt','r') as f1, open('/tmp/f2.txt','r') as f2:
    diff = difflib.ndiff(f1.readlines(),f2.readlines())
    for line in diff:
        if line.startswith('-'):
            sys.stdout.write(line)
        elif line.startswith('+'):
            sys.stdout.write('\t\t'+line)

打印(400 毫秒):
- line 500
- line 505
- line 1000
- line 99998
        + line 100000
        + line 100001
        + line 100002
        + line 100003
        + line 100004
        + line 100005
        + line 100006
        + line 100007
        + line 100008
        + line 100009

如果您想要行号,请使用枚举:
with open('/tmp/f1.txt','r') as f1, open('/tmp/f2.txt','r') as f2:
    diff = difflib.ndiff(f1.readlines(),f2.readlines())
    for i,line in enumerate(diff):
        if line.startswith(' '):
            continue
        sys.stdout.write('My count: {}, text: {}'.format(i,line))

关于python - 返回两个文件之间不同的行(Python),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/17799680/

10-12 04:40