问题描述
我想删除特殊字符,例如:
I want to remove special characters like:
- + ^ . : ,
来自使用 Java 的字符串.
from an String using Java.
推荐答案
这取决于你定义的特殊字符,但试试replaceAll(...)
:
That depends on what you define as special characters, but try replaceAll(...)
:
String result = yourString.replaceAll("[-+.^:,]","");
请注意,^
字符不能是列表中的第一个字符,因为您要么必须对其进行转义,要么表示除这些字符外的任何字符".
Note that the ^
character must not be the first one in the list, since you'd then either have to escape it or it would mean "any but these characters".
另一个注意事项:-
字符必须是列表中的第一个或最后一个,否则您必须对其进行转义或定义一个范围(例如 :-,
表示:
到 ,
范围内的所有字符.
Another note: the -
character needs to be the first or last one on the list, otherwise you'd have to escape it or it would define a range ( e.g. :-,
would mean "all characters in the range :
to ,
).
因此,为了保持一致性而不依赖于字符定位,您可能希望对正则表达式中具有特殊含义的所有字符进行转义(以下列表不完整,因此请注意其他字符,例如 (
, {
, $
等):
So, in order to keep consistency and not depend on character positioning, you might want to escape all those characters that have a special meaning in regular expressions (the following list is not complete, so be aware of other characters like (
, {
, $
etc.):
String result = yourString.replaceAll("[\-\+\.\^:,]","");
如果你想去掉所有的标点符号,试试这个正则表达式:p{P}p{S}
(请记住,在 Java 字符串中,你必须转义反斜杠:"\p{P}\p{S}"
).
If you want to get rid of all punctuation and symbols, try this regex: p{P}p{S}
(keep in mind that in Java strings you'd have to escape back slashes: "\p{P}\p{S}"
).
第三种方法可能是这样的,如果你能准确地定义你的字符串中应该留下什么:
A third way could be something like this, if you can exactly define what should be left in your string:
String result = yourString.replaceAll("[^\w\s]","");
这意味着:替换所有不是单词字符(在任何情况下为 a-z,0-9 或 _)或空格的所有内容.
This means: replace everything that is not a word character (a-z in any case, 0-9 or _) or whitespace.
请注意,还有一些其他模式可能会有所帮助.但是,我无法全部解释它们,因此请查看 regular-expressions.info 的参考部分.
please note that there are a couple of other patterns that might prove helpful. However, I can't explain them all, so have a look at the reference section of regular-expressions.info.
这是 Ray 建议的定义允许的字符"方法的限制较少的替代方法:
Here's less restrictive alternative to the "define allowed characters" approach, as suggested by Ray:
String result = yourString.replaceAll("[^\p{L}\p{Z}]","");
正则表达式匹配任何语言中不是字母和分隔符(空格、换行符等)的所有内容.请注意,您不能使用 [P{L}P{Z}]
(大写 P 表示没有该属性),因为这意味着所有不是字母的东西空格",几乎匹配所有内容,因为字母不是空格,反之亦然.
The regex matches everything that is not a letter in any language and not a separator (whitespace, linebreak etc.). Note that you can't use [P{L}P{Z}]
(upper case P means not having that property), since that would mean "everything that is not a letter or not whitespace", which almost matches everything, since letters are not whitespace and vice versa.
关于 Unicode 的其他信息
由于可能的编码方式不同(作为单个代码点或代码点的组合),某些 unicode 字符似乎会导致问题.请参阅regular-expressions.info了解更多信息.
Some unicode characters seem to cause problems due to different possible ways to encode them (as a single code point or a combination of code points). Please refer to regular-expressions.info for more information.
这篇关于如何从字符串中删除特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!