问题描述
是否有String.toLowerCase()和String.toUpperCase()的JavaScript polyfill实现,或JavaScript中可以使用Unicode字符且在浏览器中保持一致的其他方法?
Are there JavaScript polyfill implementations of String.toLowerCase() and String.toUpperCase(), or other methods in JavaScript that can work with Unicode characters and are consistent across browsers?
执行以下操作会在浏览器中,甚至在浏览器版本之间产生不同的结果(例如FireFox 54与55):
Performing the following will give difference results in browsers, or even between browser versions (E.g FireFox 54 vs 55):
document.write(String.fromCodePoint(223).normalize("NFKC").toLowerCase().toUpperCase().toLowerCase())
在Firefox 55中,它在Firefox 54中为您提供 ss
它给你ß
。
In Firefox 55 it gives you ss
, in Firefox 54 it gives you ß
.
一般情况下这很好,Locales等机制可以处理很多你的情况我想要;但是,当您需要跨平台的一致行为时,例如与BaaS系统交谈,例如它可以极大地简化您在客户端处理内部数据的交互。
Generally this is fine, and mechanisms such as Locales handle a lot of the cases you'd want; however, when you need consistent behavior across platforms such as talking to BaaS systems like google-cloud-firestore it can greatly simplify interactions where you're essentially processing internal data on the client.
推荐答案
请注意,此问题似乎只会影响过时的Firefox版本,因此,除非您明确需要支持这些旧版本,否则可以选择根本不打扰。您的示例的行为在所有现代浏览器中都是相同的(因为Firefox中的更改)。这可以通过进行验证:
Note that this issue only seems to affect outdated versions of Firefox, so unless you explicitly need to support those old versions, you could choose to just not bother at all. The behavior for your example is the same in all modern browsers (since the change in Firefox). This can be verified using jsvu + eshost:
$ jsvu # Update installed JavaScript engine binaries to the latest version.
$ eshost -e '"\xDF".normalize("NFKC").toLowerCase().toUpperCase().toLowerCase()'
#### Chakra
ss
#### V8 --harmony
ss
#### JavaScriptCore
ss
#### V8
ss
#### SpiderMonkey
ss
#### xs
ss
但是你问如何解决这个问题,让我们继续。
But you asked how to solve this problem, so let’s continue.
声明:
此 Unicode默认大小写转换算法在。
[...]
以下规则指定Unicode字符串的默认大小写转换操作。这些规则使用完整的大小写转换操作, Uppercase_Mapping(C)
, Lowercase_Mapping(C)
和 Titlecase_Mapping(C)
,以及基于套管上下文的依赖于上下文的映射,如表3-17中所示。
The following rules specify the default case conversion operations for Unicode strings. These rules use the full case conversion operations, Uppercase_Mapping(C)
, Lowercase_Mapping(C)
, and Titlecase_Mapping(C)
, as well as the context-dependent mappings based on the casing context, as specified in Table 3-17.
对于字符串 X
:
- R1
toUppercase( X)
:将C
中的每个字符C
映射到Uppercase_Mapping (C)
。 - R2
toLowercase(X)
:映射每个字符C
在X
到Lowercase_Mapping(C)
。
- R1
toUppercase(X)
: Map each characterC
inX
toUppercase_Mapping(C)
. - R2
toLowercase(X)
: Map each characterC
inX
toLowercase_Mapping(C)
.
以下是,我的注释添加如下:
Here’s an example from SpecialCasing.txt
, with my annotation added below:
00DF ; 00DF ; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S
<code>; <lower>; <title> ; <upper> ; (<condition_list>;)? # <comment>
此行表示U + 00DF('ß'
)小写为U + 00DF(ß
),大写为U + 0053 U + 0053( SS
) 。
This line says that U+00DF ('ß'
) lowercases to U+00DF (ß
) and uppercases to U+0053 U+0053 (SS
).
以下是来自,我的注释添加如下:
Here’s an example from UnicodeData.txt
, with my annotation added below:
0041 ; LATIN CAPITAL LETTER A; Lu;0;L;;;;;N;;;; 0061 ;
<code>; <name> ; <ignore> ; <lower>; <upper>
此行表示U + 0041('A'
)小写为U + 0061('a'
)。它没有明确的大写映射,这意味着它是自身的大写。
This line says that U+0041 ('A'
) lowercases to U+0061 ('a'
). It doesn’t have an explicit uppercase mapping, meaning it uppercases to itself.
这是:
0061 ; LATIN SMALL LETTER A; Ll;0;L;;;;;N;; ;0041; ; 0041
<code>; <name> ; <ignore> ; <lower>; <upper>
此行表示U + 0061('a'
)大写到U + 0041('A'
)。它没有明确的小写映射,这意味着它会自动缩小。
This line says that U+0061 ('a'
) uppercases to U+0041 ('A'
). It doesn’t have an explicit lowercase mapping, meaning it lowercases to itself.
您可以编写一个解析这两个文件的脚本,按照这些示例读取每一行,以及构建小写/大写映射。然后,您可以将这些映射转换为一个小型JavaScript库,该库提供符合规范的 toLowerCase
/ toUpperCase
功能。
You could write a script that parses these two files, reads each line following these examples, and builds lowercase/uppercase mappings. You could then turn those mappings into a small JavaScript library that provides spec-compliant toLowerCase
/toUpperCase
functionality.
这似乎很多工作。根据Firefox中的旧行为以及确切更改的内容(?),您可能会将工作限制为只是 。 (我假设只根据你提供的例子在Firefox 55中改变了特殊的外壳。)
This seems like a lot of work. Depending on the old behavior in Firefox and what exactly changed (?) you could probably limit the work to just the special mappings in SpecialCasing.txt
. (I’m making this assumption that only the special casings changed in Firefox 55, based on the example you provided.)
// Instead of…
function normalize(string) {
const normalized = string.normalize('NFKC');
const lowercased = normalized.toLowerCase();
return lowercased;
}
// …one could do something like:
function lowerCaseSpecialCases(string) {
// TODO: replace all SpecialCasing.txt characters with their lowercase
// mapping.
return string.replace(/TODO/g, fn);
}
function normalize(string) {
const normalized = string.normalize('NFKC');
const fixed = lowerCaseSpecialCases(normalized); // Workaround for old Firefox 54 behavior.
const lowercased = fixed.toLowerCase();
return lowercased;
}
我编写了一个解析 SpecialCasing.txt 并生成一个JS库,该库实现上面提到的
lowerCaseSpecialCases
功能(如 toLower
) as toUpper
。这是:取决于您的确切的用例,您可能根本不需要 toUpper
及其相应的正则表达式和映射。这是完整生成的库:
I wrote a script that parses SpecialCasing.txt
and generates a JS library that implements the lowerCaseSpecialCases
functionality mentioned above (as toLower
) as well as toUpper
. Here it is: https://gist.github.com/mathiasbynens/a37e3f3138069729aa434ea90eea4a3c Depending on your exact use case, you might not need the toUpper
and its corresponding regex and map at all. Here’s the full generated library:
const reToLower = /[\u0130\u1F88-\u1F8F\u1F98-\u1F9F\u1FA8-\u1FAF\u1FBC\u1FCC\u1FFC]/g;
const toLowerMap = new Map([
['\u0130', 'i\u0307'],
['\u1F88', '\u1F80'],
['\u1F89', '\u1F81'],
['\u1F8A', '\u1F82'],
['\u1F8B', '\u1F83'],
['\u1F8C', '\u1F84'],
['\u1F8D', '\u1F85'],
['\u1F8E', '\u1F86'],
['\u1F8F', '\u1F87'],
['\u1F98', '\u1F90'],
['\u1F99', '\u1F91'],
['\u1F9A', '\u1F92'],
['\u1F9B', '\u1F93'],
['\u1F9C', '\u1F94'],
['\u1F9D', '\u1F95'],
['\u1F9E', '\u1F96'],
['\u1F9F', '\u1F97'],
['\u1FA8', '\u1FA0'],
['\u1FA9', '\u1FA1'],
['\u1FAA', '\u1FA2'],
['\u1FAB', '\u1FA3'],
['\u1FAC', '\u1FA4'],
['\u1FAD', '\u1FA5'],
['\u1FAE', '\u1FA6'],
['\u1FAF', '\u1FA7'],
['\u1FBC', '\u1FB3'],
['\u1FCC', '\u1FC3'],
['\u1FFC', '\u1FF3']
]);
const toLower = (string) => string.replace(reToLower, (match) => toLowerMap.get(match));
const reToUpper = /[\xDF\u0149\u01F0\u0390\u03B0\u0587\u1E96-\u1E9A\u1F50\u1F52\u1F54\u1F56\u1F80-\u1FAF\u1FB2-\u1FB4\u1FB6\u1FB7\u1FBC\u1FC2-\u1FC4\u1FC6\u1FC7\u1FCC\u1FD2\u1FD3\u1FD6\u1FD7\u1FE2-\u1FE4\u1FE6\u1FE7\u1FF2-\u1FF4\u1FF6\u1FF7\u1FFC\uFB00-\uFB06\uFB13-\uFB17]/g;
const toUpperMap = new Map([
['\xDF', 'SS'],
['\uFB00', 'FF'],
['\uFB01', 'FI'],
['\uFB02', 'FL'],
['\uFB03', 'FFI'],
['\uFB04', 'FFL'],
['\uFB05', 'ST'],
['\uFB06', 'ST'],
['\u0587', '\u0535\u0552'],
['\uFB13', '\u0544\u0546'],
['\uFB14', '\u0544\u0535'],
['\uFB15', '\u0544\u053B'],
['\uFB16', '\u054E\u0546'],
['\uFB17', '\u0544\u053D'],
['\u0149', '\u02BCN'],
['\u0390', '\u0399\u0308\u0301'],
['\u03B0', '\u03A5\u0308\u0301'],
['\u01F0', 'J\u030C'],
['\u1E96', 'H\u0331'],
['\u1E97', 'T\u0308'],
['\u1E98', 'W\u030A'],
['\u1E99', 'Y\u030A'],
['\u1E9A', 'A\u02BE'],
['\u1F50', '\u03A5\u0313'],
['\u1F52', '\u03A5\u0313\u0300'],
['\u1F54', '\u03A5\u0313\u0301'],
['\u1F56', '\u03A5\u0313\u0342'],
['\u1FB6', '\u0391\u0342'],
['\u1FC6', '\u0397\u0342'],
['\u1FD2', '\u0399\u0308\u0300'],
['\u1FD3', '\u0399\u0308\u0301'],
['\u1FD6', '\u0399\u0342'],
['\u1FD7', '\u0399\u0308\u0342'],
['\u1FE2', '\u03A5\u0308\u0300'],
['\u1FE3', '\u03A5\u0308\u0301'],
['\u1FE4', '\u03A1\u0313'],
['\u1FE6', '\u03A5\u0342'],
['\u1FE7', '\u03A5\u0308\u0342'],
['\u1FF6', '\u03A9\u0342'],
['\u1F80', '\u1F08\u0399'],
['\u1F81', '\u1F09\u0399'],
['\u1F82', '\u1F0A\u0399'],
['\u1F83', '\u1F0B\u0399'],
['\u1F84', '\u1F0C\u0399'],
['\u1F85', '\u1F0D\u0399'],
['\u1F86', '\u1F0E\u0399'],
['\u1F87', '\u1F0F\u0399'],
['\u1F88', '\u1F08\u0399'],
['\u1F89', '\u1F09\u0399'],
['\u1F8A', '\u1F0A\u0399'],
['\u1F8B', '\u1F0B\u0399'],
['\u1F8C', '\u1F0C\u0399'],
['\u1F8D', '\u1F0D\u0399'],
['\u1F8E', '\u1F0E\u0399'],
['\u1F8F', '\u1F0F\u0399'],
['\u1F90', '\u1F28\u0399'],
['\u1F91', '\u1F29\u0399'],
['\u1F92', '\u1F2A\u0399'],
['\u1F93', '\u1F2B\u0399'],
['\u1F94', '\u1F2C\u0399'],
['\u1F95', '\u1F2D\u0399'],
['\u1F96', '\u1F2E\u0399'],
['\u1F97', '\u1F2F\u0399'],
['\u1F98', '\u1F28\u0399'],
['\u1F99', '\u1F29\u0399'],
['\u1F9A', '\u1F2A\u0399'],
['\u1F9B', '\u1F2B\u0399'],
['\u1F9C', '\u1F2C\u0399'],
['\u1F9D', '\u1F2D\u0399'],
['\u1F9E', '\u1F2E\u0399'],
['\u1F9F', '\u1F2F\u0399'],
['\u1FA0', '\u1F68\u0399'],
['\u1FA1', '\u1F69\u0399'],
['\u1FA2', '\u1F6A\u0399'],
['\u1FA3', '\u1F6B\u0399'],
['\u1FA4', '\u1F6C\u0399'],
['\u1FA5', '\u1F6D\u0399'],
['\u1FA6', '\u1F6E\u0399'],
['\u1FA7', '\u1F6F\u0399'],
['\u1FA8', '\u1F68\u0399'],
['\u1FA9', '\u1F69\u0399'],
['\u1FAA', '\u1F6A\u0399'],
['\u1FAB', '\u1F6B\u0399'],
['\u1FAC', '\u1F6C\u0399'],
['\u1FAD', '\u1F6D\u0399'],
['\u1FAE', '\u1F6E\u0399'],
['\u1FAF', '\u1F6F\u0399'],
['\u1FB3', '\u0391\u0399'],
['\u1FBC', '\u0391\u0399'],
['\u1FC3', '\u0397\u0399'],
['\u1FCC', '\u0397\u0399'],
['\u1FF3', '\u03A9\u0399'],
['\u1FFC', '\u03A9\u0399'],
['\u1FB2', '\u1FBA\u0399'],
['\u1FB4', '\u0386\u0399'],
['\u1FC2', '\u1FCA\u0399'],
['\u1FC4', '\u0389\u0399'],
['\u1FF2', '\u1FFA\u0399'],
['\u1FF4', '\u038F\u0399'],
['\u1FB7', '\u0391\u0342\u0399'],
['\u1FC7', '\u0397\u0342\u0399'],
['\u1FF7', '\u03A9\u0342\u0399']
]);
const toUpper = (string) => string.replace(reToUpper, (match) => toUpperMap.get(match));
这篇关于如何使各种浏览器的toLowerCase()和toUpperCase()保持一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!