Javascript unescape()与Python urllib.unquote()

通过阅读各种文章，似乎JavaScript的unescape()等同于Pythons的urllib.unquote()，但是当我同时测试这两者时，我得到了不同的结果:

在浏览器控制台中:

unescape('%u003c%u0062%u0072%u003e');

输出:<br>
在Python解释器中:

import urllib
urllib.unquote('%u003c%u0062%u0072%u003e')

输出:%u003c%u0062%u0072%u003e
我希望Python也会返回<br>。关于我在这里缺少什么的任何想法？

谢谢!

最佳答案

%uxxxx是urllib.parse.unquote()(Py 3)/urllib.unquote()(Py 2)不支持的non standard URL encoding scheme。

它只是ECMAScript ECMA-262第三版的一部分；该格式被W3C拒绝，并且从不作为RFC的一部分。

您可以使用正则表达式转换此类代码点:

try:
    unichr  # only in Python 2
except NameError:
    unichr = chr  # Python 3

re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: unichr(int(m.group(1), 16)), quoted)

这将对%uxxxx和%uxx形式的ECMAScript 3rd ed可以解码。

演示:

>>> import re
>>> quoted = '%u003c%u0062%u0072%u003e'
>>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: chr(int(m.group(1), 16)), quoted)
'<br>'
>>> altquoted = '%u3c%u0062%u0072%u3e'
>>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: chr(int(m.group(1), 16)), altquoted)
'<br>'

但您应尽可能避免完全使用编码。

关于Javascript unescape()与Python urllib.unquote()，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/23158822/