问题描述
经过一些疯狂的Google搜索,我似乎找不到一个简单的问题的结论性答案。
After some frantic Googling, I can't seem to find a conclusive answer to a simple question. I apologize if this is question is answered somewhere, but if so I couldn't find it.
在Javascript中编写加密方法时,我想知道什么是字符编码我的字符串正在使用,为什么。
While writing an encryption method in Javascript, I came to wondering what character encoding my strings were using, and why.
那么:什么决定了Javascript中的字符编码?是标准吗?由浏览器?由HTTP请求的头确定?在包含它的HTML的< META>
标记中?提供页面的服务器?
So: what determines character encoding in Javascript? Is it a standard? By the browser? Determined by the header of the HTTP request? In the <META>
tag of HTML that encompasses it? The server that feeds the page?
通过我的经验测试(改变不同的设置,然后使用 charCodeAt
奇怪的字符,并看到该值匹配的编码),它似乎总是UTF-8或UTF-16,但我不确定为什么
By my empirical testing (changing different settings, then using charCodeAt
on a sufficiently strange character and seeing which encoding the value matches up with) it appears to always be UTF-8 or UTF-16, but I'm not sure why.
感谢您的帮助!
推荐答案
E262的第8.4节:
Section 8.4 of E262:
当字符串包含实际文本数据时,每个元素都被认为是单个UTF-16代码单元。无论这是否是字符串的实际存储格式,字符串中的字符都由它们的初始代码单元元素位置编号,就好像使用UTF-16表示。字符串上的所有操作(除非另有说明)将它们视为未分化的16位无符号整数序列;它们不能确保生成的String是正规化的形式,也不会确保对语言敏感的结果。
When a String contains actual textual data, each element is considered to be a single UTF-16 code unit. Whether or not this is the actual storage format of a String, the characters within a String are numbered by their initial code unit element position as though they were represented using UTF-16. All operations on Strings (except as otherwise stated) treat them as sequences of undifferentiated 16-bit unsigned integers; they do not ensure the resulting String is in normalised form, nor do they ensure language-sensitive results.
;它似乎意味着所有计数的对待字符串,如果每个字符是一个UTF-16字符,但同时没有什么可以确保它都是有效的。
That wording is kind-of weasely; it seems to mean that everything that counts treats strings as if each character is a UTF-16 character, but at the same time nothing ensures that it'll all be valid.
编辑—为了清楚,意图是字符串由UTF-16编码点组成。在ES2015中,字符串值的定义包括此注释:
edit — to be clear, the intention is that strings consist of UTF-16 codepoints. In ES2015, the definition of "string value" includes this note:
仍然是一个字符串,即使它包含不能作为正确的unicode字符工作的值。
So a string is still a string even when it contains values that don't work as correct unicode characters.
这篇关于默认Javascript字符编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!