问题描述
我在使用BeautifulSoup4一个问题...(我相当一个Python / BeautifulSoup新手,所以请原谅我,如果我哑)
I have a problem using BeautifulSoup4... (I'm quite a Python/BeautifulSoup newbie, so forgive me if i'm dumb)
为什么以下code:
from bs4 import BeautifulSoup
soup_ko = BeautifulSoup('<select><option>foo</option><option>bar & baz</option><option>qux</option></select>')
soup_ok = BeautifulSoup('<select><option>foo</option><option>bar and baz</option><option>qux</option></select>')
print soup_ko.find_all('option')
print soup_ok.find_all('option')
产生下面的输出:
produce the following output:
[<option>foo</option>, <option>bar & baz</option>]
[<option>foo</option>, <option>bar and baz</option>, <option>qux</option>]
我期待相同的结果,我的3个选项的数组...但BeautifulSoup似乎不喜欢在文字与符号?我怎样才能摆脱这种并得到一个正确的阵列,而无需编辑我的HTML(或通过变换/转换的话)?
i was expecting the same result, an array of my 3 options... but BeautifulSoup seems to dislike the ampersand in the text? How can i get rid of this and get a correct array without editing my HTML (or by transforming/converting it)?
感谢,
编辑:似乎是一个错误4.2.0 ...我下载都和4.2.0版本4.2.1(从的和的),将它解压缩在我的脚本文件夹,更改我的code为:
Seems like a 4.2.0 bug... i downloaded both 4.2.0 and 4.2.1 versions (from http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.0.tar.gz and http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.1.tar.gz), unzip it in my script folder, change my code to:
import sys
sys.path.insert(0, "beautifulsoup4-" + sys.argv[1])
from bs4 import BeautifulSoup, __version__
print "Beautiful Soup %s" % __version__
soup_ko = BeautifulSoup('<select><option>foo</option><option>bar & baz</option><option>qux</option></select>')
print soup_ko.find_all('option')
和得到的结果:
15:24:38 pataluc ~ % python stack.py 4.2.0
Beautiful Soup 4.2.0
[<option>foo</option>, <option>bar & baz</option>]
15:24:41 pataluc ~ % python stack.py 4.2.1
Beautiful Soup 4.2.1
[<option>foo</option>, <option>bar & baz</option>, <option>qux</option>]
所以我想我的问题是关闭的。感谢您的意见谁使我意识到这是一个版本的问题。
so i guess my question is closed. thanks for your comments who made me realize it was a version issue.
推荐答案
正如我在编辑的第一篇文章说,这是BeautifulSoup 4.2.0中的错误,我下载4.2.1和错误也没有了。
As i said in the edited first post, it was a bug in BeautifulSoup 4.2.0, i downloaded 4.2.1 and the bug is gone.
这篇关于BeautifulSoup4:与符号文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!