问题描述
这是我目前的代码:
return str.matches("^[A-Za-z\\-'. ]+");
我希望它包含国际字母。我如何用Java做到这一点?
I want it to include international letters. How do I do that in Java?
谢谢。
推荐答案
看来你想要的是,匹配所有字母字符。通常情况下,您可以使用Posix \p {Alpha}
表达式,通过您想要允许的标点扩展。正如所述,它只匹配ASCII。
It seems that you want is, to match all the alphabetic characters. Typically you would do that by using Posix \p{Alpha}
expression, extended by the punctuation you want also to permit. As Java Regular Expressions documentation says, it matches ASCII only.
但是,哪些文档没有说清楚,你可以使这个类使用Unicode字符。要做到这一点,你需要打开 Unicode字符类匹配。
您可以通过以下两种方式之一完成此操作:
However, what documentation does not say clearly is, you can make this class work with Unicode characters. To do just that you need to turn Unicode character class matching on.
You can do this in one of two ways:
- 通过创建
Pattern
对象传递UNICODE_CHARACTER_CLASS
常量:
模式p = Pattern.compile(^ [p {Alpha} \\-'。] +,UNICODE_CHARACTER_CLASS);
- 使用
(?U)
嵌入式模式标志:
str.matches(^(?U) [\\\\ {Alpha} \\-'。] +);
- By creating
Pattern
object passing theUNICODE_CHARACTER_CLASS
constant:Pattern p = Pattern.compile("^[p{Alpha}\\-'. ]+", UNICODE_CHARACTER_CLASS);
- By using
(?U)
embedded pattern flag:str.matches("^(?U)[\\p{Alpha}\\-'. ]+");
概念证明:
String[] test = {"Jean-Marie Le'Blanc", "Żółć", "Ὀδυσσεύς", "原田雅彦"};
for (String str : test) {
System.out.print(str.matches("^(?U)[\\p{Alpha}\\-'. ]+") + " ");
}
显而易见的结果是:
如果您认为一切正确,我还有两个额外的要点:
If you think that all is correct, I have two additional points to make:
- 原田雅彦(原田雅子)由。实际上它们 不 字母字符,
- 您想要匹配点(。)符号。没关系,但请考虑匹配。
- 原田雅彦 (Masahiko Harada) is composed of Ideographic characters. In fact they are not the alphabetic characters,
- You want to match the dot (.) symbol. It's OK, but please consider matching Ideographic fullstops as well.
这篇关于带有国际字母的Java正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!