本文介绍了Python的查找和放大器;更换美味的汤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用美丽的汤用的HTML文件中A HREF链接

I am using Beautiful Soup to replace the occurrences of a pattern with a href link inside a HTML file

我面临的一个问题

modified_contents = re.sub("([^http://*/s]APP[a-z]{2}[0-9]{2})", "<a href=\"http://stack.com=\\1\">\\1</a>", str(soup))

输入示例1:

Input File contains APPdd34

Output File contains <a href="http://stack.com=APPdd34"> APPdd34</a>

输入示例2:

Input File contains <a href="http://stack.com=APPdd34"> APPdd34</a>

Output File contains <a href="http://stack.com=<a href="http://stack.com=APPdd34"> APPdd34</a>"> <a href="http://stack.com=APPdd34"> APPdd34</a></a>

所需的输出文件2是相同的示例输入文件2。

Desired Output File 2 is same as Sample Input File 2.

我怎样才能纠正这个问题?

How can I rectify this problem?

推荐答案

这可能不完全回答你的问题,因为我不知道整个输入文件可能看起来像,但我希望这是你可以采取的一个方向。

This may not entirely answer your problem because I don't know an entire input file could look like, but I hope this is a direction you can take.

from BeautifulSoup import BeautifulSoup, Tag
text = """APPdd34"""
soup = BeautifulSoup(text)
var1 = soup.text
text = """&lt;a href="http://stack.com=APPdd34"&gt; APPdd34&lt;/a&gt;"""
soup = BeautifulSoup(text)
var2 = soup.find('a').text

soup = BeautifulSoup("&lt;p>Some new html&lt;/p&gt;")
tag1 = Tag(soup, "a",{'href':'http://stack.com='+var1,})
tag1.insert(0,var1) # Insert text
tag2 = Tag(soup, "a",{'href':'http://stack.com='+var2,})
tag2.insert(0,var2)
soup.insert(0,tag1)
soup.insert(3,tag2)
print soup.prettify()

所以基本上,只要使用BeautifulSoup提取文本,然后你可以从那里建立的标签。

So basically, just use BeautifulSoup to extract the text and then you can build Tags from there.

这篇关于Python的查找和放大器;更换美味的汤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-27 05:46