问题描述
我有一个这样的字符串:
I have a string like this:
a b c a b " a b " b a " a "
如何匹配不是由 "
分隔的字符串的一部分的每个 a
?我想匹配这里粗体的所有内容:
How do I match every a
that is not part of a string delimited by "
? I want to match everything that is bold here:
a bc a b " ab " b a " a "
我想替换那些匹配项(或者更确切地说,通过用空字符串替换它们来删除它们),因此删除引用的部分进行匹配将不起作用,因为我希望它们保留在字符串中.我正在使用 Ruby.
I want to replace those matches (or rather remove them by replacing them with an empty string), so removing the quoted parts for matching won't work, because I want those to remain in the string. I'm using Ruby.
推荐答案
假设引号正确平衡并且没有转义引号,那么很简单:
Assuming the quotes are correctly balanced and there are no escaped quotes, then it's easy:
result = subject.gsub(/a(?=(?:[^"]*"[^"]*")*[^"]*\Z)/, '')
当且仅当匹配的 a
前面有偶数个引号时,这会将所有 a
替换为空字符串.
This replaces all the a
s with the empty string if and only if there is an even number of quotes ahead of the matched a
.
说明:
a # Match a
(?= # only if it's followed by...
(?: # ...the following:
[^"]*" # any number of non-quotes, followed by one quote
[^"]*" # the same again, ensuring an even number
)* # any number of times (0, 2, 4 etc. quotes)
[^"]* # followed by only non-quotes until
\Z # the end of the string.
) # End of lookahead assertion
如果你可以在引号内转义引号(a "length: 2\""
),它仍然是可能的,但会更复杂:
If you can have escaped quotes within quotes (a "length: 2\""
), it's still possible but will be more complicated:
result = subject.gsub(/a(?=(?:(?:\\.|[^"\\])*"(?:\\.|[^"\\])*")*(?:\\.|[^"\\])*\Z)/, '')
这与上面的正则表达式本质上是一样的,只是用(?:\\.|[^"\\])
代替[^"]
:
This is in essence the same regex as above, only substituting (?:\\.|[^"\\])
for [^"]
:
(?: # Match either...
\\. # an escaped character
| # or
[^"\\] # any character except backslash or quote
) # End of alternation
这篇关于如何匹配不在两个特殊字符之间的正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!