python - 查找每个引用并将其附加到html链接-Python

我有一个来自Wikipedia的HTML文件，想要在页面上找到每个链接，例如/wiki/Absinthe，并将其替换为前面添加的当前目录（例如/home/fergus/wikiget/wiki/Absinthe），以便：

<a href="/wiki/Absinthe">Absinthe</a>

变成：

<a href="/home/fergus/wikiget/wiki/Absinthe">Absinthe</a>

这是整个文档的全部内容。

你有什么想法？我很高兴使用BeautifulSoup或Regex！

最佳答案

这是使用re模块的解决方案：

#!/usr/bin/env python
import re

open('output.html', 'w').write(re.sub('href="http://en.wikipedia.org', 'href="/home/fergus/wikiget/wiki/Absinthe', open('file.html').read()))

这是不使用re的另一个：

#!/usr/bin/env python
open('output.html', 'w').write(open('file.html').read().replace('href="http://en.wikipedia.org', 'href="/home/fergus/wikiget/wiki/Absinthe'))

关于python - 查找每个引用并将其附加到html链接-Python，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/5217760/