本文介绍了使用Perl比较2个大文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在比较使用批处理文件中调用的Perl的2个大型CSV文件。
我将结果放在第3个文件中。
I am comparing 2 large CSV file using Perl that's called in a batch file.I put the result in a 3rd file.
当前该文件包含其他信息,例如标头和其他类似这样的行:
Currently the file contains other information like headers, and other lines like this:
--- file1.txt Wed Mar 7 14:57:10 2018
+++ file2.txt Wed Mar 7 13:56:51 2018
@@ -85217,4 +85217,8 @@
结果如何文件只包含区别?
谢谢。
How can the result file only contains the difference ?Thank you.
这是我的perl:
#!/usr/bin/env perl
use strict; use warnings;
use Text::Diff;
my $diffs = diff 'file1.txt' => 'file2.txt';
print $diffs;
这是我的批处理文件:
perl diffperl.pl > newperl.csv
推荐答案
统一格式,
- 前两行表示要比较的文件。
- 以 @ 指示文件中差异的位置。
- 以
-
开头的行表示仅在第一个文件中的行。 - 以
+
开头的行表示仅在第二个文件中。 - 以空格开头的行表示两个文件中都有一行。
- 输出可能包含
\文件末尾没有换行符
。 - 其中的每一行都将以换行符结尾,即使
- The first two lines indicate the files being compared.
- Lines that start with "
@
" indicate location of the differences in the file. - Lines that start with a "
-
" indicates a line that is only in the first file. - Lines that start with a "
+
" indicates a line that is only in the second file. - Lines that start with a space indicate a line that is in both files.
- The output may contain the line "
\ No newline at end of file
". - Every line of in the difference will be newline-terminated, even if the lines of the input aren't.
解决方案:
$diffs =~ s/^(?:[^\n]*+\n){2}//;
$diffs =~ s/^[\@ \\][^\n]*+\n//mg;
请注意,添加 CONTEXT => 0
会减少要删除的行数。
Note that adding CONTEXT => 0
will reduce the number of lines to remove.
也就是说,这没什么意义如果需要自己的输出格式,请使用。您最好直接使用。
That said, there's not much point in using Text::Diff if you want your own output format. You might as well use Algorithm::Diff directly.
use Algorithm::Diff qw( traverse_sequences );
my $qfn1 = 'file1.txt';
my $qfn2 = 'file2.txt';
my @file1 = do { open(my $fh, '<', $qfn1) or die("Can't open \"$qfn1\": $!\n"); <$fh> };
my @file2 = do { open(my $fh, '<', $qfn2) or die("Can't open \"$qfn2\": $!\n"); <$fh> };
if (@lines1) { chomp($lines1[-1]); $lines1[-1] .= "\n"; }
if (@lines2) { chomp($lines2[-1]); $lines2[-1] .= "\n"; }
traverse_sequences(\@lines1, \@lines2, {
DISCARD_A => sub { print("-", $lines1[$_[0]]); },
DISCARD_B => sub { print("+", $lines2[$_[1]]); },
});
这篇关于使用Perl比较2个大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!