我在这件事上挣扎了一段时间。我正在尝试将字符串写入HTML,但在清理完之后,它们的格式出现了问题。下面是一个例子:

paragraphs = ['Grocery giant and household name Woolworths is battered and bruised. ',
'But behind the problems are still the makings of a formidable company']

x = str(" ")
for item in paragraphs:
    x = x + str(item)
x

输出:
"Grocery giant and household name\xc2\xa0Woolworths is battered and\xc2\xa0bruised.
But behind the problems are still the makings of a formidable\xc2\xa0company"

期望输出:
"Grocery giant and household name Woolworths is battered and bruised.
But behind the problems are still the makings of a formidable company"

我希望你能解释为什么会发生这种情况,以及我如何解决。事先谢谢!

最佳答案

\ xc2\xa0表示0xc2 0xa0是所谓的
不间断空间
它是一种不可见的UTF-8编码控制字符。有关它的更多信息,请查看维基百科:https://en.wikipedia.org/wiki/Non-breaking_space
我复制了你在问题中粘贴的内容,得到了预期的输出。

关于python - Python HTML Encoding\xc2\xa0,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/32419541/

10-15 21:40