问题描述
除了使用 String.replaceAll()
方法并一个一个替换字母之外,是否有更好的方法来摆脱重音并使这些字母规则?示例:
Is there a better way for getting rid of accents and making those letters regular apart from using String.replaceAll()
method and replacing letters one by one?Example:
输入:orčpžsíáýd
输出:orcpzsiayd
它不需要包含所有带有重音符号的字母,如俄语字母或中文字母.
It doesn't need to include all letters with accents like the Russian alphabet or the Chinese one.
推荐答案
使用 java.text.Normalizer
为您处理此问题.
string = Normalizer.normalize(string, Normalizer.Form.NFD);
// or Normalizer.Form.NFKD for a more "compatible" deconstruction
这会将所有重音符号与字符分开.然后,您只需要将每个字符与字母进行比较,然后将不是的扔掉.
This will separate all of the accent marks from the characters. Then, you just need to compare each character against being a letter and throw out the ones that aren't.
string = string.replaceAll("[^\\p{ASCII}]", "");
如果你的文本是 unicode,你应该使用它:
If your text is in unicode, you should use this instead:
string = string.replaceAll("\\p{M}", "");
对于 unicode,\\P{M}
匹配基本字形,\\p{M}
(小写)匹配每个重音符号.
For unicode, \\P{M}
matches the base glyph and \\p{M}
(lowercase) matches each accent.
感谢 GarretWilson 提供指针,感谢 regular-expressions.info 提供出色的 unicode指南.
Thanks to GarretWilson for the pointer and regular-expressions.info for the great unicode guide.
这篇关于有没有办法摆脱重音并将整个字符串转换为常规字母?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!