我正在尝试获取在页面上的特定xpath处找到的数据。我可以通过请求进入页面。我已经通过使用r.text将源代码打印到我的屏幕并将显示的文本与我要查找的文本进行比较来验证我是否处于正确的页面。
text返回一个很难从中提取所需信息的字符串。我被告知lxml是通过xpath搜索信息的方法。不幸的是,我收到一个类型错误。
from lxml import html
import requests
payload = {'login_pass': 'password', 'login_user': 'username','submit':'go'}
r = requests.get("website", params=payload)
print r.encoding
tree = html.fromstring(r.text)
print tree
print tree.text_content()
回报
UTF-8
<Element html at 0x10dab8d08>
Traceback (most recent call last):
File "/Users/Me/Documents/PYTHON/GetImageAsPdf/ImageToPDF_requests_beta.py", line 11, in <module>
print tree.text_content()
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/idlelib/PyShell.py", line 1343, in write
return self.shell.write(s, self.tags)
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/idlelib/rpc.py", line 595, in __call__
value = self.sockio.remotecall(self.oid, self.name, args, kwargs)
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/idlelib/rpc.py", line 210, in remotecall
seq = self.asynccall(oid, methodname, args, kwargs)
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/idlelib/rpc.py", line 225, in asynccall
self.putmessage((seq, request))
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/idlelib/rpc.py", line 324, in putmessage
s = pickle.dumps(message)
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle _ElementUnicodeResult objects
我试过检查邮件头
r.headers
回报
{'charset': 'utf-8',
'x-powered-by': 'PHP/5.3.3',
'transfer-encoding': 'chunked',
'set-cookie': 'PHPSESSID=c6i7kph59nl9ocdlkckmjavas1; path=/, LOGIN_USER=deleted; expires=Tue, 15-Oct-2013 15:12:08 GMT; path=/',
'expires': 'Thu, 19 Nov 1981 08:52:00 GMT',
'server': 'Apache/2.2.15 (CentOS)',
'connection': 'close',
'pragma': 'no-cache',
'cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0',
'date': 'Wed, 15 Oct 2014 15:12:09 GMT',
'content-type': 'text/html; charset=UTF-8'}
我的目标是能够通过xpath搜索树,如下所示:
quantity = tree.xpath('/html/body/form[1]/table[3]/tbody[1]/tr/td[2]/table/tbody/tr/td[1]/table/tbody/tr/td/table[1]/tbody/tr[1]/td[2]/strong')
你能帮我找出哪里出错了吗?
最佳答案
您应该能够将_ElementUnicodeResult
对象转换为常规的、可选择的unicode字符串。
对于Python 2,只需用unicode()
来包装它,例如print unicode(tree.text_content())
使用Python 3,只需将它包装成str()
,例如str(tree.text_content())
关于python - 使用lxml处理请求中的html。 TypeError:无法腌制_ElementUnicodeResult对象,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/26386198/