问题描述
我正在尝试构建一个灵活的正则表达式,以选择媒体文件的歌手姓名和歌曲标题.我希望它具有灵活性并支持以下所有功能:
I'm trying to build a flexible regular expression to pick out the artist name and song title of a media file. I'd like it to be flexible and support all of the following:
01表演艺术家的例子-Song.mp3的例子
01 Example Artist - Example Song.mp3
01示例Song.mp3(在此示例中,没有艺术家,因此组应该为空)
01 Example Song.mp3(In this example, there's no artist so that group should be null)
示例艺术家-示例Song.mp3
Example Artist - Example Song.mp3
示例Song.mp3(再次,没有艺术家)
Example Song.mp3(Again, no artist)
我想出了以下内容(使用.NET语法,特别是对于命名捕获组):
I've come up with the following (in .NET syntax, particularly for named capture groups):
\d{0,2}\s*(?<artist>[^-]*)?[\s-]*(?<songname>.*)(\.mp3|\.m4a)
这很好用,但是对于此输入失败:01示例Song.mp3
This works well, but fails for this input:01 Example Song.mp3
我相信由于贪婪的匹配,它吞下了歌手的名字.因此,我尝试修改表达式,以便艺术家部分可以进行延迟匹配:
It swallows the song name as the artist, I believe because of greedy matching. So, I tried modifying the expression so that the artist part would be lazy matching:
\d{0,2}\s*(?<artist>[^-]*)*?[\s-]*(?<songname>.*)(\.mp3|\.m4a)
更改为:
(?<artist>[^-]*)?
成为
(?<artist>[^-]*)*?
这确实可以解决上述问题.但是现在,此输入失败:
This does indeed fix the above problem. But now, it fails for this input:
01表演艺术家的例子-Song.mp3的例子
01 Example Artist - Example Song.mp3
现在,它太懒了,因为它捕获了"Example Artist-Example Song"作为歌曲名,却没有捕获任何艺术家名称.
Now, it's too lazy in that it captures "Example Artist - Example Song" as the songname and captures nothing for the artist name.
有人对此有建议吗?
推荐答案
您不能仅凭贪婪来完成此任务,您需要使用组(无论是否可选)进行更具描述性的描述.一个例子:
You can't achieve this task only with greediness, you need to be more descriptive using groups (optional or not). An example:
(?x) # switch on comment mode
^ # start of the string
(?: (?<track>\d{1,3}) \s*[\s-]\s* )? # the track is optional ( including separators)
(?: (?<artist>.+?) \s*-\s* )? # the same with the artist name
(?<title> .+ )
(?<ext> \.m(?:p3|4a) )
顺便说一句,即使采用世界上最好的模式,音频文件名也可能很奇怪,我怀疑您是否可以处理所有情况.
As an aside, audio filenames can be very weird, even with the best pattern of the world, I doubt you can handle all cases.
如果将.+
替换为更明确的内容,则可以变得更加灵活和高效:
You can be a little more flexible and more efficient if you replace .+
with something more explicit:
^(?x)
(?: (?<track>\d{1,3}) \s*[\s-]\s* )?
(?: (?<artist> \S+ (?>[ .-][^\s.-]*)*? ) \s*-\s*)?
(?<title> [^.\n]+ (?>\.[^.\n]*)*? )
(?<ext> \.m(?:p3|4a) )
( \ n
仅在此处用于测试目的,您可以在一次应用模式一个文件名时将其删除)
( \n
are only here for test purpose, you can remove them when you apply the pattern one filename at a time)
这篇关于正则表达式挑选歌手姓名和歌曲名称,并出现延迟匹配问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!