问题描述
我正在尝试创建一个用于提取歌手、作词者的正则表达式.我想知道如何让歌词搜索成为可选.
I'm trying to create a regex for extracting singers, lyricists. I was wondering how to make lyricists search optional.
示例多行字符串:
Fireworks Singer: Katy Perry
Vogue Singers: Madonna, Karen Lyricist: Madonna
正则表达式:/Singers?:(.\*)\s?Lyricists?:(.\*)/
这正确匹配第二行并提取Singers(Madonna, Karen)
和Lyricists(Madonna)
This matches the second line correctly and extracts Singers(Madonna, Karen)
and Lyricists(Madonna)
但是当没有作词者时,它不适用于第一行.
But it does not work with the first line, when there are no Lyricists.
如何让歌词搜索成为可选?
How do I make Lyricists search optional?
推荐答案
您可以将要匹配的部分包含在非捕获组中:(?:)
.然后它可以被视为正则表达式中的单个单元,随后您可以在它后面放置一个 ?
以使其成为可选.示例:
You can enclose the part you want to match in a non-capturing group: (?:)
. Then it can be treated as a single unit in the regex, and subsequently you can put a ?
after it to make it optional. Example:
/Singers?:(.*)\s?(?:Lyricists?:(.*))?/
注意这里的 \s?
是没有用的,因为 .*
会贪婪地吃掉所有的字符,不需要回溯.这也意味着 (?:Lyricists?:(.*))
部分将永远不会因为同样的原因被匹配.您可以使用 .*
、.*?
和 $
的非贪婪版本来解决此问题:
Note that here the \s?
is useless since .*
will greedily eat all characters, and no backtracking will be necessary. This also means that the (?:Lyricists?:(.*))
part will never be matched for the same reason. You can use the non-greedy version of .*
, .*?
along with the $
to fix this:
/Singers?:(.*?)\s*(?:Lyricists?:(.*))?$/
一些额外的空白最终被捕获;这也可以删除,给出最终的正则表达式:
Some extra whitespace ends up captured; this can be removed also, giving a final regex of:
/Singers?:\s*(.*?)\s*(?:Lyricists?:\s*(.*))?$/
这篇关于正则表达式可选单词匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!