问题描述
我有一个正则表达式 / \s *,\s * / ,它先匹配左空格,再匹配逗号,再匹配右空格。
示例:
var str =约翰,沃克詹姆斯,保罗;
var arr = str.split(/ \s *,\s * /);
arr中的值= [john,walker james,paul] //大小:3
带有汉字的示例:
var str =继续,取消继续,取消;
var arr = str.split(/ \s *,\s * /);
arr中的值= [继续,取消继续,取消] //大小:1,索引0处的所有值均未发生分裂
尝试用unicode分割字符:
var str = john,walker james,保罗;
var arr = str.split(/ \u0020 * \u002C\u0020 * /);
arr中的值= [约翰,沃克詹姆斯,保罗] //大小:3
var str =继续,取消继续,取消;
var arr = str.split(/ \u0020 * \u002C\u0020 * /);
arr中的值= [继续,取消继续,取消] //大小:1,索引0处的所有值均未发生分裂
我通过了
I have a regular expression /\s*,\s*/ that matches left spaces followed by comma then right spaces.
Example:
var str = "john,walker james , paul";
var arr = str.split(/\s*,\s*/);
Values in arr = [john,walker james,paul] // Size: 3
Example with Chinese characters:
var str = "继续,取消 继续 ,取消";
var arr = str.split(/\s*,\s*/);
Values in arr = ["继续,取消 继续 ,取消"] // Size: 1, All values at index 0 no splitting happened
Tried splitting characters with unicodes:
var str = "john,walker james , paul";
var arr = str.split(/\u0020*\u002C\u0020*/);
Values in arr = [john,walker james,paul] // Size: 3
var str = "继续,取消 继续 ,取消";
var arr= str.split(/\u0020*\u002C\u0020*/);
Values in arr = ["继续,取消 继续 ,取消"] // Size: 1, All values at index 0 no splitting happened
I went through this link but not much info was there that I can use in my scenario. Is it really impossible to create regex for Chinese characters and split them?
An ASCII comma won't match the comma you have in Chinese text. Either replace the ASCII comma (\x2C
) with the Chinese one (\uFF0C
), or use a character class [,,]
to match both:
var str = "继续,取消 继续 ,取消";
console.log(str.split(/\s*[,,]\s*/));
Here is a regex that will match all the commas mentioned on the Comma Wikipedia page:
/\s*(?:\uD805\uDC4D|\uD836\uDE87|[\u002C\u02BB\u060C\u2E32\u2E34\u2E41\u2E49\u3001\uFE10\uFE11\uFE50\uFE51\uFF0C\uFF64\u00B7\u055D\u07F8\u1363\u1802\u1808\uA4FE\uA60D\uA6F5\u02BD\u0312\u0313\u0314\u0315\u0326\u201A])\s*/
Note that U+1144D
(NEWA COMMA) and U+1DA87
(SIGNWRITING COMMA) have to be transpiled as \uD805\uDC4D
and \uD836\uDE87
in order to be compatible with the ES5 regex standard.
The following commas are handled:
这篇关于JavaScript中中文逗号匹配和拆分的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!