BeautifulSoup不给我的Uni code

本文介绍了BeautifulSoup不给我的Uni code的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我用美丽的汤抽取数据。该BS文档指出BS应该总是返回的Uni code，但我似乎无法得到统一code。这里有一个code段

I'm using Beautiful soup to scrape data. The BS documentation states that BS should always return Unicode but I can't seem to get Unicode. Here's a code snippet

import urllib2
from libs.BeautifulSoup import BeautifulSoup

# Fetch and parse the data
url = 'http://wiki.gnhlug.org/twiki2/bin/view/Www/PastEvents2007?skin=print.pattern'

data = urllib2.urlopen(url).read()
print 'Encoding of fetched HTML : %s', type(data)

soup = BeautifulSoup(data)
print 'Encoding of souped up HTML : %s', soup.originalEncoding

table = soup.table
print type(table.renderContents())

从页返回的原始数据是字符串。 BS显示原始编码为ISO-8859-1。我认为，BS自动转换一切的Uni code那么为什么，当我做到这一点：

The original data returned from the page is a string. BS shows the original encoding as ISO-8859-1. I thought that BS automatically converted everything to Unicode so why is it that when I do this:

table = soup.table
print type(table.renderContents())

..它给了我一个字符串对象，而不是统一code？

..it gives me a string object and not Unicode?

如何从BS得到一个统一code对象？

How can i get a Unicode objects from BS?

我真的，真的失去了与此有关。任何帮助吗？先谢谢了。

I'm really, really lost with this. Any help? Thanks in advance.

BS

BeautifulSoup不给我的Uni code

问题描述

推荐答案