本文介绍了用LXML文本元素中的HTML标记替换文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一些lxml
元素:
>> lxml_element.text
'hello BREAK world'
我需要将BREAK
替换为HTML中断标记— <br />
.我试图做简单的文本替换:
I need to replace the word BREAK
with an HTML break tag—<br />
. I've tried to do simple text replacing:
lxml_element.text.replace('BREAK', '<br />')
,但它会插入带有转义符号的标签,例如<br/>
.我该如何解决这个问题?
but it inserts the tag with escaped symbols, like <br/>
. How do I solve this problem?
推荐答案
这是您的方法.根据您的问题设置样本lxml:
Here's how you could do it. Setting up a sample lxml from your question:
>>> import lxml
>>> some_data = "<b>hello BREAK world</b>"
>>> root = lxml.etree.fromstring(some_data)
>>> root
<Element b at 0x3f35a50>
>>> root.text
'hello BREAK world'
接下来,创建一个子元素标签< br>:
Next, create a subelement tag <br>:
>>> childbr = lxml.etree.SubElement(root, "br")
>>> childbr
<Element br at 0x3f35b40>
>>> lxml.etree.tostring(root)
'<b>hello BREAK world<br/></b>'
但这不是您想要的.您必须在< br>之前加上文字.并将其放置在:
But that's not all you want. You have to take the text before the <br> and place it in .text
:
>>> root.text = "hello"
>>> lxml.etree.tostring(root)
'<b>hello<br/></b>'
然后将子级的.tail
设置为包含其余文本:
Then set the .tail
of the child to contain the rest of the text:
>>> childbr.tail = "world"
>>> lxml.etree.tostring(root)
'<b>hello<br/>world</b>'
这篇关于用LXML文本元素中的HTML标记替换文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!