python - 新手Python/Regex:使用正则表达式在<a>标记之间提取字符串

需要使用re模块在Python中的href属性标签之间提取字符串。

我尝试了许多模式，例如：

patFinderLink = re.compile('\>"(CVE.*)"\<\/a>')

示例：我需要从以下位置提取标签之间的内容（在本例中为“ CVE-2010-3718”）：

<pre>
<a href="https://www.redhat.com/security/data/cve/CVE-2010-3718.html">CVE-2010-3718</a>
</pre>

我在这里做错了什么？任何意见是极大的赞赏。先感谢您。

太阳

最佳答案

我很惊讶没有人建议使用BeautifulSoup：

这是我要怎么做：

from BeautifulSoup import BeautifulSoup
import re

hello = """
<pre>
<a href="https://www.redhat.com/security/data/cve/CVE-2010-3718.html">CVE-2010-3718</a>
<a href="https://www.redhat.com/security/data/cve/CVE-2010-3710.html">CVE-2010-3718</a>
<a href="https://www.redhat.com/security/data/cve/CVE-2010-3700.html">CVE-2010-3718</a>
</pre>
"""

target = re.compile("CVE-\d+-\d+.html")
commentSoup = BeautifulSoup(hello)
atags = commentSoup.findAll(href=target)
for a in atags:
    match = re.findall(target, a['href'])[0]
    print match

结果：

CVE-2010-3718.html
CVE-2010-3710.html
CVE-2010-3700.html