问题描述
我需要一种方法来检查字符串是否包含日语 或 中文文字。
I need a way to check whether a string contains Japanese or Chinese text.
目前我正在使用这个:
string.match(/[\u3400-\u9FBF]/);
但它不适用于此例如:ディアボリックラヴァーズ
或バッテリー
。
but it does not work with this for example: ディアボリックラヴァーズ
or バッテリー
.
你可以帮帮我吗?
谢谢
推荐答案
常用于中文和日文文本的Unicode字符范围是:
The ranges of Unicode characters which are routinely used for Chinese and Japanese text are:
- U + 3040 - U + 30FF:平假名和片假名(仅限日语)
- U + 3400 - U + 4DBF:CJK统一表意文字扩展名A(中文,日文和韩文)
- U + 4E00 - U + 9FFF:CJK统一表意文字(中文,日文和韩文)
- U + F900 - U + FAFF:CJK兼容性表意文字(中文,日文和韩文)
- U + FF66 - U + FF9F:半角片假名(仅限日语)
- U+3040 - U+30FF: hiragana and katakana (Japanese only)
- U+3400 - U+4DBF: CJK unified ideographs extension A (Chinese, Japanese, and Korean)
- U+4E00 - U+9FFF: CJK unified ideographs (Chinese, Japanese, and Korean)
- U+F900 - U+FAFF: CJK compatibility ideographs (Chinese, Japanese, and Korean)
- U+FF66 - U+FF9F: half-width katakana (Japanese only)
作为正则表达式,这表示为:
As a regular expression, this would be expressed as:
/[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f]/
这不包括每个字符将出现在中文和日文文本中,但任何重要的典型中文或日文文本将主要由这些范围内的字符组成。
This does not include every character which will appear in Chinese and Japanese text, but any significant piece of typical Chinese or Japanese text will be mostly made up of characters from these ranges.
请注意此正则表达式也将匹配包含的韩语文本。这是不可避免的结果。
Note that this regular expression will also match on Korean text that contains hanja. This is an unavoidable result of Han unification.
这篇关于检查字符串是否包含日文/中文字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!