本文介绍了如何从字符串中删除特殊字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想删除特殊字符,例如:

I want to remove special characters like:

- + ^ . : ,

来自使用 Java 的字符串.

from an String using Java.

推荐答案

这取决于你定义的特殊字符,但试试replaceAll(...):

That depends on what you define as special characters, but try replaceAll(...):

String result = yourString.replaceAll("[-+.^:,]","");

请注意,^ 字符不能是列表中的第一个字符,因为您要么必须对其进行转义,要么表示除这些字符外的任何字符".

Note that the ^ character must not be the first one in the list, since you'd then either have to escape it or it would mean "any but these characters".

另一个注意事项:- 字符必须是列表中的第一个或最后一个,否则您必须对其进行转义或定义一个范围(例如 :-, 表示:, 范围内的所有字符.

Another note: the - character needs to be the first or last one on the list, otherwise you'd have to escape it or it would define a range ( e.g. :-, would mean "all characters in the range : to ,).

因此,为了保持一致性而不依赖于字符定位,您可能希望对正则表达式中具有特殊含义的所有字符进行转义(以下列表不完整,因此请注意其他字符,例如 (, {, $ 等):

So, in order to keep consistency and not depend on character positioning, you might want to escape all those characters that have a special meaning in regular expressions (the following list is not complete, so be aware of other characters like (, {, $ etc.):

String result = yourString.replaceAll("[\-\+\.\^:,]","");


如果你想去掉所有的标点符号,试试这个正则表达式:p{P}p{S}(请记住,在 Java 字符串中,你必须转义反斜杠:"\p{P}\p{S}").


If you want to get rid of all punctuation and symbols, try this regex: p{P}p{S} (keep in mind that in Java strings you'd have to escape back slashes: "\p{P}\p{S}").

第三种方法可能是这样的,如果你能准确地定义你的字符串中应该留下什么:

A third way could be something like this, if you can exactly define what should be left in your string:

String  result = yourString.replaceAll("[^\w\s]","");

这意味着:替换所有不是单词字符(在任何情况下为 a-z,0-9 或 _)或空格的所有内容.

This means: replace everything that is not a word character (a-z in any case, 0-9 or _) or whitespace.

请注意,还有一些其他模式可能会有所帮助.但是,我无法全部解释它们,因此请查看 regular-expressions.info 的参考部分.

please note that there are a couple of other patterns that might prove helpful. However, I can't explain them all, so have a look at the reference section of regular-expressions.info.

这是 Ray 建议的定义允许的字符"方法的限制较少的替代方法:

Here's less restrictive alternative to the "define allowed characters" approach, as suggested by Ray:

String  result = yourString.replaceAll("[^\p{L}\p{Z}]","");

正则表达式匹配任何语言中不是字母和分隔符(空格、换行符等)的所有内容.请注意,您不能使用 [P{L}P{Z}] (大写 P 表示没有该属性),因为这意味着所有不是字母的东西空格",几乎匹配所有内容,因为字母不是空格,反之亦然.

The regex matches everything that is not a letter in any language and not a separator (whitespace, linebreak etc.). Note that you can't use [P{L}P{Z}] (upper case P means not having that property), since that would mean "everything that is not a letter or not whitespace", which almost matches everything, since letters are not whitespace and vice versa.

关于 Unicode 的其他信息

由于可能的编码方式不同(作为单个代码点或代码点的组合),某些 unicode 字符似乎会导致问题.请参阅regular-expressions.info了解更多信息.

Some unicode characters seem to cause problems due to different possible ways to encode them (as a single code point or a combination of code points). Please refer to regular-expressions.info for more information.

这篇关于如何从字符串中删除特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-12 11:09