本文介绍了解码URL中的转义字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个列表,其中包含带有转义字符的URL。这些角色在恢复html页面时由 urllib2.urlopen
设置:
I have a list containing URLs with escaped characters in them. Those characters have been set by urllib2.urlopen
when it recovers the html page:
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&action=edit
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&action=history
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&variant=zh
有没有办法在python中将它们转换回未转义的形式?
Is there a way to transform them back to their unescaped form in python?
PS:URL被编码为utf-8
P.S.: The URLs are encoded in utf-8
推荐答案
用等效的单字符替换%xx
转义。
Replace %xx
escapes by their single-character equivalent.
示例: unquote('/%7Econnolly /')
产生'/〜connolly /'
/ p >
Example: unquote('/%7Econnolly/')
yields '/~connolly/'
.
然后解码。
这篇关于解码URL中的转义字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!