使用Perl从输入中提取和过滤一系列行

本文介绍了使用Perl从输入中提取和过滤一系列行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是Perl的新手，在使用foreach循环跳过行时遇到一些问题.我想将文本文件的某些行复制到新文件.

I'm quite new to Perl and I have some problems in skipping lines using a foreach loop. I want to copy some lines of a text file to a new one.

当一行的第一个单词为FIRST ITERATION时，再跳过两行，然后打印所有内容，直到遇到文件末尾或出现空行为止.

When the first words of a line are FIRST ITERATION, skip two more lines and print everything following until the end of the file or an empty line is encountered.

我试图找到类似的帖子，但是没有人谈论使用文本文件.

I've tried to find out a similar post but nobody talks about working with text files.

这是我想到的形式

use 5.010;
use strict;
use warnings;

open( INPUT, "xxx.txt" ) or die("Could not open log file.");
open( OUT, ">>yyy.txt" );

foreach my $line (<INPUT>) {

    if ( $line =~ m/^FIRST ITERATION/ ) {

        # print OUT
    }
}

close(OUT);
close(INFO);

我尝试使用next和$line++，但是我的程序仅打印以FIRST ITERATION开头的行.

I tried using next and $line++ but my program prints only the line that begins with FIRST ITERATION.

我可能会尝试使用for循环，但是我不知道我的文件可能有多少行，也不知道第一个迭代"和下一个空行之间有多少行.

I may try to use a for loop but I don't know how many lines my file may have, nor do I know how many lines there are between "First Iteration" and the next empty line.

推荐答案

最简单的方法是一次处理文件一行，并且如果当前行以FIRST ITERATION开头，则将状态标志保留为1.如果为空，则返回0；否则，如果已经为正，则递增1，以便提供当前块内行号的计数.

The simplest way is to process the file a line at a time and keep a state flag which is set to 1 if the current line is begins with FIRST ITERATION and 0 if it is blank, otherwise it is incremented if it is already positive so that it provides a count of the line number within the current block

此解决方案希望在命令行上将输入文件的路径作为参数并将其输出打印到STDOUT，因此您需要根据需要将输出重定向到命令行上的文件

This solution expects the path to the input file as a parameter on the command line and prints its output to STDOUT, so you will need to redirect the output to the file on the command line as necessary

请注意，正则表达式模式/\S/检查当前行中是否存在非空白字符，因此如果该行为空或全部为空白字符，则not /\S/为true

Note that the regex pattern /\S/ checks whether there is a non-blank character anywhere in the current line, so not /\S/ is true if the line is empty or all blank characters

use strict;
use warnings;

my $lines = 0;

while ( <> ) {

    if ( /^FIRST ITERATION/ ) {
        $lines = 1;
    }
    elsif ( not /\S/ ) {
        $lines = 0;
    }
    elsif ( $lines > 0 ) {
        ++$lines;
    }

    print if $lines > 3;
}

使用Perl内置的 range运算符可以大大简化此过程，该运算符保留其自己的内部状态并返回已被评估的次数.因此，上面的内容可能会写成

This can be simplified substantially by using Perl's built-in range operator, which keeps its own internal state and returns the number of times it has been evaluated. So the above may be written

use strict;
use warnings;

while ( <> ) {
    my $s = /^FIRST ITERATION/ ... not /\S/;
    print if $s and $s > 3;
}

最后一个可以重写为这样的单行命令行程序

And the last can be rewritten as a one-line command line program like this

$ perl -ne '$s = /^FIRST ITERATION/ ... not /\S/; print if $s and $s > 3' myfile.txt

这篇关于使用Perl从输入中提取和过滤一系列行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！