regex - 将多段文档拆分为段落编号的句子

我有一个经过良好解析的多段落文档列表(所有段落以\n\n 分隔，句子以“.”分隔)，我想将这些文档拆分成句子，并附有一个数字，表示其中的段落编号文档。例如，(两段)输入是:

First sentence of the 1st paragraph. Second sentence of the 1st paragraph. \n\n

First sentence of the 2nd paragraph. Second sentence of the 2nd paragraph. \n\n

理想情况下，输出应该是:

1 First sentence of the 1st paragraph.

1 Second sentence of the 1st paragraph.

2 First sentence of the 2nd paragraph.

2 Second sentence of the 2nd paragraph.

我熟悉 Perl 中的 Lingua::Sentences 包，它可以将文档拆分成句子。但是，它与段落编号不兼容。因此，我想知道是否有其他方法可以实现上述目标(文档中没有缩写)。任何帮助是极大的赞赏。谢谢!

最佳答案

正如您提到的 Lingua::Sentences ，我认为可以选择稍微操作此模块的原始输出以获得您需要的内容

use Lingua::Sentence;

my @paragraphs = split /\n{2,}/, $splitter->split($text);

foreach my $index (0..$#paragraphs) {
    my $paragraph = join "\n\n", map { $index+1 . " $_" }
        split /\n/, $paragraphs[$index];
    print "$paragraph\n\n";
}

关于regex - 将多段文档拆分为段落编号的句子，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/18174646/