我正在尝试使用tldextract提取域
ext = tldextract.extract(editString2)
print (ext.domain)
但是我同时收到此错误,还是要停止此错误?我正在获取结果和打印结果,但是只是试图找到一种不让它显示此错误的方法。
error reading TLD cache file C:\Python33\lib\site-packages\tldextract\.tld_set: 'charmap' codec can't decode byte 0x81 in position 2350: character maps to <undefined>
Exception reading Public Suffix List url https://raw.github.com/mozilla/mozilla-central/master/netwerk/dns/effective_tld_names.dat. Consider using a mirror or constructing your TLDExtract with `fetch=False`.
Traceback (most recent call last):
File "C:\Python33\lib\site-packages\tldextract\tldextract.py", line 247, in _PublicSuffixListSource
page = unicode(urlopen(url).read(), 'utf-8')
File "C:\Python33\lib\urllib\request.py", line 156, in urlopen
return opener.open(url, data, timeout)
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 513, in error
return self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
最佳答案
GitHub上的“ mozilla / mozilla-central”已重命名为“ mozilla / gecko-dev”,没有重定向,因此为404。URL在最新版本的tldextract
1.3.1中已修复。
如果尚未修复,则可以手动向TLDExtract
kwarg调用的自己的suffix_list_url
提供PSL URL。请参见docs。
关于python - Python tldextract错误读取TLD缓存文件,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/20822555/