问题描述
我的脚本可以很好地执行此操作:
My script works fine doing this:
images = re.findall("src.\"(\S*?media.tumblr\S*?tumblr_\S*?jpg)", doc)
videos = re.findall("\S*?(http\S*?video_file\S*?tumblr_[a-zA-Z0-9]*)", doc)
但是,我认为在整个文档中搜索两次是低效的.
However, I believe it is inefficient to search through the whole document twice.
如果有帮助,这里有一个示例文档:http://pastebin.com/5kRZXjij
Here's a sample document if it helps: http://pastebin.com/5kRZXjij
我希望上面的输出如下:
I would expect the following output from the above:
images = http://37.media.tumblr.com/tumblr_lnmh4tD3sM1qi02clo1_500.jpg
videos = http://bassrx.tumblr.com/video_file/86319903607/tumblr_lo8i76CWSP1qi02cl
相反,最好执行以下操作:
Instead it would be better to do something like:
image_and_video_links = re.findall(" <match-image-links-or-video links> ", doc)
如何将两行 re.findall
合二为一?
How can I combine the two re.findall
lines into one?
我曾尝试使用 |
字符,但总是无法匹配任何内容.所以我确定我完全不知道如何正确使用它.
I have tried using the |
character but I always fail to match anything. So I'm sure I'm completely confused as to how to use it properly.
推荐答案
如评论中所述,管道 (|)
应该可以解决问题.
As mentioned in the comments, a pipe (|)
should do the trick.
正则表达式
(src.\"(\S*?media.tumblr\S*?tumblr_\S*?jpg))|(\S*?(http\S*?video_file\S*?tumblr_[a-zA-Z0-9]*))
捕获两种模式中的任何一种.
catches either of the two patterns.
关于 正则表达式测试器
这篇关于如何将多个正则表达式合并为一行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!