问题描述
我有一行文本,例如
这是可以"解决的非常有趣"问题的测试
This "is" a test "of very interesting" problems "that can" be solved
而且我正在尝试拆分它,以便我的数组 @goodtext 将包含来自引用部分的许多字符串.所以我的数组将包含以下内容:
And I'm trying to split it so that my array @goodtext would contain however many strings there are from quoted sections.So my array would contain the following:
$goodtext[0] is
$goodtext[1] of very interesting
$goodtext[2] that can
不幸的是,每行中引用部分的数量各不相同...
The number of quoted sections in each line varies, unfortunately...
推荐答案
假设没有合理的嵌套
my @quoted = $string =~ /"([^"]+)"/g;
或者,如果您需要在收集它们时进行一些处理
or, if you need to be able to do some processing while collecting them
my @quoted;
while ($string =~ /"([^"]+)"/g) { #" (stop faulty markup highlight)
# ...
push @quoted, $1;
}
请注意,我们需要结束 "
,即使 [^"]+
无论如何都会匹配它.这是为了让引擎消耗它并通过它,所以 "
的下一个匹配确实是下一个打开的匹配.
Note that we need the closing "
, even though [^"]+
will match up to it anyway. This is so that the engine consumes it and gets past it, so the next match of "
is indeed the next opening one.
如果引号也可以嵌套"",那么您需要 Text::Balanced
If the quotations "can be "nested" as well" then you'd want Text::Balanced
顺便说一句,请注意列表和标量中 /g
修饰符的行为差异 上下文.
As an aside, note the difference in behavior of the /g
modifier in list and scalar contexts.
在列表上下文中,由列表分配强加(到
@quoted
在第一个示例中),使用/g
修饰符,匹配运算符返回所有捕获的列表,或者如果模式中没有捕获(无括号),则返回所有匹配的列表
In the list context, imposed by the list assignment (to
@quoted
in the first example), with the/g
modifier the match operator returns a list of all captures, or of all matches if there is no capturing in the pattern (no parens)
在标量上下文中,当作为 while
条件进行评估时(例如),它与 /g
的行为更加复杂.匹配后,下一次正则表达式运行时,它会继续从前一次匹配(之后)的位置开始搜索字符串,从而遍历匹配.
In the scalar context, when evaluated as the while
condition (for example), its behavior with /g
is more complex. After a match, the next time the regex runs it continues searching the string from the position of (one after) the previous match, thus iterating through matches.
请注意,我们不需要为此循环(什么是细微错误的细微原因)
Note that we don't need a loop for this (what is a subtle cause for subtle bugs)
my $string = q(one simple string);
$string =~ /(\w+)/g;
say $1; #--> one
$string =~ /(\w+)g;
say $1; #--> simple
在任何一个正则表达式中都没有 /g
我们不会得到这种行为,而是 one
被打印两次.
Without /g
in either regex we don't get this behavior, but rather one
is printed both times.
参见全局匹配inperlretut,例如 \G
assertion 在 perlop 和 pos
See Global matching in perlretut, and for instance \G
assertion in perlop and pos
这篇关于Perl 拆分和正则表达式查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!