This question already has answers here:
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' - -when using urlib.request python3
                                
                                    (2个答案)
                                
                        
                                7个月前关闭。
            
                    
我想用网址中的德语变音符来抓取一个网站。这是我在python 3.3中的代码,在没有任何变音符号的情况下效果很好。

def numResults(keyword):
try:
    page_google = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=' +keyword
    print(page_google)
    req_google = Request(page_google)
    req_google.add_header('User Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120427 Firefox/15.0a1')
    html_google = urlopen(req_google).read()
    soup = BeautifulSoup(html_google)
except URLError as e:
    print(e)
return soup


但是当我要求类似的东西时:

print(numResults('älterer'))


我收到以下错误,因为urllib无法处理我猜的变音符号:

Traceback (most recent call last):
File "C:\Users\zwieback86\Desktop\programming\scrape.py", line 137, in <module>
print(numResults('älterer'))
File "C:\Users\zwieback86\Desktop\programming\scrape.py", line 73, in numResults
html_google = urlopen(req_google).read()
File "c:\python33\lib\urllib\request.py", line 156, in urlopen
return opener.open(url, data, timeout)
File "c:\python33\lib\urllib\request.py", line 469, in open
response = self._open(req, data)
File "c:\python33\lib\urllib\request.py", line 487, in _open
'_open', req)
File "c:\python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "c:\python33\lib\urllib\request.py", line 1268, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "c:\python33\lib\urllib\request.py", line 1248, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "c:\python33\lib\http\client.py", line 1061, in request
self._send_request(method, url, body, headers)
File "c:\python33\lib\http\client.py", line 1089, in _send_request
self.putrequest(method, url, **skips)
File "c:\python33\lib\http\client.py", line 953, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 38: ordinal not in range(128)


当我输入地址时
    “ http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q =älterer”
进入浏览器,我得到了想要的页面。

因此,我认为urllib无法处理url中带有变音符号的请求。但是我该如何解决它会接受德国变音符的问题呢?不能更改像ä-> ae这样的变音符号。

非常感谢和问候!

最佳答案

使用请求模块!它优于urllib2。 http://docs.python-requests.org/en/latest/

>>> import requests
>>> r = requests.get('http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=' +'älterer')
>>> print r.text
{"responseData": {"results":[{"GsearchResultClass":"GwebSearch","unescapedUrl":"http://de.wikipedia.org/wiki/Alter","url":"http://de.wikipedia.org/wiki/Alter","visibleUrl":"de.wikipedia.org","cacheUrl":"http://www.google.com/search?q\u003dcache:xN4gCMgmnZ0J:de.wikipedia.org","title":"\u003cb\u003eAlter\u003c/b\u003e – Wikipedia","titleNoFormatting":"Alter – Wikipedia","content":"Unter dem \u003cb\u003eAlter\u003c/b\u003e versteht man den Lebensabschnitt rund um die mittlere   Lebenserwartung des Menschen, also das Lebensalter zwischen dem mittleren \u003cb\u003e...\u003c/b\u003e"},{"GsearchResultClass":"GwebSearch","unescapedUrl":"http://de.wikipedia.org/wiki/Blumfeld,_ein_%C3%A4lterer_Junggeselle","url":"http://de.wikipedia.org/wiki/Blumfeld,_ein_%25C3%25A4lterer_Junggeselle","visibleUrl":"de.wikipedia.org","cacheUrl":"http://www.google.com/search?q\u003dcache:VtRzZLhU-qkJ:de.wikipedia.org","title":"Blumfeld, ein \u003cb\u003eälterer\u003c/b\u003e Junggeselle – Wikipedia","titleNoFormatting":"Blumfeld, ein älterer Junggeselle – Wikipedia","content":"Blumfeld, ein \u003cb\u003eälterer\u003c/b\u003e Junggeselle ist eine Erzählung von Franz Kafka. Sie wurde   1915 verfasst und postum veröffentlicht. Sie behandelt die skurrilen \u003cb\u003e...\u003c/b\u003e"},{"GsearchResultClass":"GwebSearch","unescapedUrl":"http://www.arbeitsagentur.de/nn_193018/Navigation/zentral/Buerger/Hilfen/Beschaeftigung-Aelterer/Beschaeftigung-Aelterer-Nav.html","url":"http://www.arbeitsagentur.de/nn_193018/Navigation/zentral/Buerger/Hilfen/Beschaeftigung-Aelterer/Beschaeftigung-Aelterer-Nav.html","visibleUrl":"www.arbeitsagentur.de","cacheUrl":"http://www.google.com/search?q\u003dcache:SXb9a3GIufkJ:www.arbeitsagentur.de","title":"Beschäftigung \u003cb\u003eÄlterer\u003c/b\u003e - www.arbeitsagentur.de","titleNoFormatting":"Beschäftigung Älterer - www.arbeitsagentur.de","content":"\u003cb\u003eÄltere\u003c/b\u003e Arbeitnehmer/-innen, die ihre Arbeitslosigkeit durch Aufnahme einer   geringer entlohnten versicherungspflichtigen Beschäftigung beenden oder \u003cb\u003e...\u003c/b\u003e"},{"GsearchResultClass":"GwebSearch","unescapedUrl":"http://www.imdb.com/title/tt0932839/","url":"http://www.imdb.com/title/tt0932839/","visibleUrl":"www.imdb.com","cacheUrl":"http://www.google.com/search?q\u003dcache:V-wfqhR1ABUJ:www.imdb.com","title":"\u0026quot;Monaco Franze - Der ewige Stenz\u0026quot; Ein ernsthafter \u003cb\u003eälterer\u003c/b\u003e Herr - IMDb","titleNoFormatting":"\u0026quot;Monaco Franze - Der ewige Stenz\u0026quot; Ein ernsthafter älterer Herr - IMDb","content":"Directed by Helmut Dietl. With Helmut Fischer, Ruth-Maria Kubitschek, Karl   Obermayr, Christine Kaufmann."}],"cursor":{"resultCount":"1,940,000","pages":[{"start":"0","label":1},{"start":"4","label":2},{"start":"8","label":3},{"start":"12","label":4},{"start":"16","label":5},{"start":"20","label":6},{"start":"24","label":7},{"start":"28","label":8}],"estimatedResultCount":"1940000","currentPageIndex":0,"moreResultsUrl":"http://www.google.com/search?oe\u003dutf8\u0026ie\u003dutf8\u0026source\u003duds\u0026start\u003d0\u0026hl\u003den\u0026q\u003d%C3%A4lterer","searchResultTime":"0.08"}}, "responseDetails": null, "responseStatus": 200}

关于python - urlopen.request,网址中带有变音符号,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/17528612/

10-12 20:16