默认Javascript字符编码

默认Javascript字符编码

本文介绍了默认Javascript字符编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

经过一些疯狂的Google搜索,我似乎找不到一个简单的问题的结论性答案。

After some frantic Googling, I can't seem to find a conclusive answer to a simple question. I apologize if this is question is answered somewhere, but if so I couldn't find it.

在Javascript中编写加密方法时,我想知道什么是字符编码我的字符串正在使用,为什么。

While writing an encryption method in Javascript, I came to wondering what character encoding my strings were using, and why.

那么:什么决定了Javascript中的字符编码?是标准吗?由浏览器?由HTTP请求的头确定?在包含它的HTML的< META> 标记中?提供页面的服务器?

So: what determines character encoding in Javascript? Is it a standard? By the browser? Determined by the header of the HTTP request? In the <META> tag of HTML that encompasses it? The server that feeds the page?

通过我的经验测试(改变不同的设置,然后使用 charCodeAt 奇怪的字符,并看到该值匹配的编码),它似乎总是UTF-8或UTF-16,但我不确定为什么

By my empirical testing (changing different settings, then using charCodeAt on a sufficiently strange character and seeing which encoding the value matches up with) it appears to always be UTF-8 or UTF-16, but I'm not sure why.

感谢您的帮助!

推荐答案

E262的第8.4节:

Section 8.4 of E262:

当字符串包含实际文本数据时,每个元素都被认为是单个UTF-16代码单元。无论这是否是字符串的实际存储格式,字符串中的字符都由它们的初始代码单元元素位置编号,就好像使用UTF-16表示。字符串上的所有操作(除非另有说明)将它们视为未分化的16位无符号整数序列;它们不能确保生成的String是正规化的形式,也不会确保对语言敏感的结果。

When a String contains actual textual data, each element is considered to be a single UTF-16 code unit. Whether or not this is the actual storage format of a String, the characters within a String are numbered by their initial code unit element position as though they were represented using UTF-16. All operations on Strings (except as otherwise stated) treat them as sequences of undifferentiated 16-bit unsigned integers; they do not ensure the resulting String is in normalised form, nor do they ensure language-sensitive results.

;它似乎意味着所有计数的对待字符串,如果每个字符是一个UTF-16字符,但同时没有什么可以确保它都是有效的。

That wording is kind-of weasely; it seems to mean that everything that counts treats strings as if each character is a UTF-16 character, but at the same time nothing ensures that it'll all be valid.

编辑—为了清楚,意图是字符串由UTF-16编码点组成。在ES2015中,字符串值的定义包括此注释:

edit — to be clear, the intention is that strings consist of UTF-16 codepoints. In ES2015, the definition of "string value" includes this note:

仍然是一个字符串,即使它包含不能作为正确的unicode字符工作的值。

So a string is still a string even when it contains values that don't work as correct unicode characters.

这篇关于默认Javascript字符编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 19:41