问题描述
当我使用正则表达式时,显然Java的正则表达式将变音符号和其他特殊字符计为非单词字符。
Apparently Java's Regex flavor counts Umlauts and other special characters as non-"word characters" when I use Regex.
"TESTÜTEST".replaceAll( "\\W", "" )
返回TESTTEST我。我想要的只是删除所有真正的非单词字符。任何方式都可以做到这一点,而不是像
returns "TESTTEST" for me. What I want is for only all truly non-"word characters" to be removed. Any way to do this without having something along the lines of
"[^A-Za-z0-9äöüÄÖÜßéèáàúùóò]"
只是意识到我忘了ô?
推荐答案
使用 [^ \p {L} \p {Nd}] +
- 这匹配所有(Unicode)字符字母也不是(十进制)数字。
Use [^\p{L}\p{Nd}]+
- this matches all (Unicode) characters that are neither letters nor (decimal) digits.
在Java中:
String resultString = subjectString.replaceAll("[^\\p{L}\\p{Nd}]+", "");
修改:
我将 \p {N}
更改为 \p {Nd}
因为前者也匹配某些数字符号,如¼
;后者没有。请在上查看。
I changed \p{N}
to \p{Nd}
because the former also matches some number symbols like ¼
; the latter doesn't. See it on regex101.com.
这篇关于删除所有非“字词”从Java中的String,留下重音字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!