使用Perl比较2个大文件 | 使用Perl比较2个大文件

本文介绍了使用Perl比较2个大文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在比较使用批处理文件中调用的Perl的2个大型CSV文件。
我将结果放在第3个文件中。

I am comparing 2 large CSV file using Perl that's called in a batch file.I put the result in a 3rd file.

当前该文件包含其他信息，例如标头和其他类似这样的行：

Currently the file contains other information like headers, and other lines like this:

--- file1.txt   Wed Mar  7 14:57:10 2018
+++ file2.txt   Wed Mar  7 13:56:51 2018
@@ -85217,4 +85217,8 @@

结果如何文件只包含区别？
谢谢。

How can the result file only contains the difference ?Thank you.

这是我的perl：

#!/usr/bin/env perl
use strict; use warnings;
use Text::Diff;
my $diffs = diff 'file1.txt' => 'file2.txt';
print $diffs;

这是我的批处理文件：

perl diffperl.pl > newperl.csv

推荐答案

统一格式，

前两行表示要比较的文件。

以 @ 指示文件中差异的位置。

以 -开头的行表示仅在第一个文件中的行。

以 + 开头的行表示仅在第二个文件中。

以空格开头的行表示两个文件中都有一行。

输出可能包含 \文件末尾没有换行符。

其中的每一行都将以换行符结尾，即使

The first two lines indicate the files being compared.
Lines that start with "@" indicate location of the differences in the file.
Lines that start with a "-" indicates a line that is only in the first file.
Lines that start with a "+" indicates a line that is only in the second file.
Lines that start with a space indicate a line that is in both files.
The output may contain the line "\ No newline at end of file".
Every line of in the difference will be newline-terminated, even if the lines of the input aren't.

解决方案：

$diffs =~ s/^(?:[^\n]*+\n){2}//;
$diffs =~ s/^[\@ \\][^\n]*+\n//mg;

请注意，添加 CONTEXT => 0 会减少要删除的行数。

Note that adding CONTEXT => 0 will reduce the number of lines to remove.

也就是说，这没什么意义如果需要自己的输出格式，请使用。您最好直接使用。

That said, there's not much point in using Text::Diff if you want your own output format. You might as well use Algorithm::Diff directly.

use Algorithm::Diff qw( traverse_sequences );

my $qfn1 = 'file1.txt';
my $qfn2 = 'file2.txt';

my @file1 = do { open(my $fh, '<', $qfn1) or die("Can't open \"$qfn1\": $!\n"); <$fh> };
my @file2 = do { open(my $fh, '<', $qfn2) or die("Can't open \"$qfn2\": $!\n"); <$fh> };

if (@lines1) { chomp($lines1[-1]); $lines1[-1] .= "\n"; }
if (@lines2) { chomp($lines2[-1]); $lines2[-1] .= "\n"; }

traverse_sequences(\@lines1, \@lines2, {
   DISCARD_A => sub { print("-", $lines1[$_[0]]); },
   DISCARD_B => sub { print("+", $lines2[$_[1]]); },
});

这篇关于使用Perl比较2个大文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！