问题描述
这里的情况是,我发送POST请求并试图用Python
获取响应问题是它会扭曲非拉丁字母,当我通过直接链接获取相同页面时,它不会发生没有搜索结果),但POST请求不会生成链接
这是我做的:
为什么不试试 thepage = the_page.decode('utf-8')而不是 encode 因为你想要从utf-8编码文本转换为unicode - 编码不可知 - 内部字符串?
here's the situation, i'm sending POST requests and trying to fetch the response with Python problem is that it distorts non latin letters, which doesn't happen when i fetch the same page with direct link (with no search results), but POST requests wont generate a link
here's what i do:
import urllib import urllib2 url = 'http://donelaitis.vdu.lt/main_helper.php?id=4&nr=1_2_11' data = 'q=bus&ieskoti=true&lang1=en&lang2=en+-%3E+lt+%28+71813+lygiagre%C4%8Di%C5%B3+sakini%C5%B3+%29&lentele=vertikalus®=false&rodyti=dalis&rusiuoti=freq' req = urllib2.Request(url, data) response = urllib2.urlopen(req) the_page = response.read() file = open("pagesource.txt", "w") file.write(the_page) file.close()whenever i try
thepage = the_page.encode('utf-8')i get this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1008: ordinal not in range(128)whenever i try do change response header Content-Type:text/html;charset=utf-8, i do
response['Content-Type'] = 'text/html;charset=utf-8'i get this error:
AttributeError: addinfourl instance has no attribute '__setitem__'My question: is it possible to edit or remove response or request headers? if not, is there another way to solve this problem other that copying source to notepad++ and fixing encoding manually?
i'm new to python and data mining, really hope you'd let me know if i;m doing something wrong
thanks
解决方案Why don't your try thepage = the_page.decode('utf-8')instead of encode since what you want is to move from utf-8 encoded text to unicode - coding agnostic - internal strings?
这篇关于Python POST请求编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!