问题描述
我发现了一些关于正则表达式过滤掉非英语的引用,但是Java中的,除了它们都指的是问题比我想要解决的问题:
I found a few references to regex filtering out non-English but none of them is in Java, aside from the fact that they are all referring to somewhat different problems than what I am trying to solve:
- 替换所有非英文字符带有空格的
。 - 创建一个返回
true的方法
如果字符串包含任何非英语
字符。
英文文本不仅指实际字母和数字,还指标点符号。
By "English text" I mean not only actual letters and numbers but also punctuation.
到目前为止,我能够为目标#1带来的非常简单:
So far, what I have been able to come with for goal #1 is quite simple:
String.replaceAll("\\W", " ")
实际上,这么简单,我怀疑我错过了什么......您是否发现上述任何警告?
In fact, so simple that I suspect that I am missing something... Do you spot any caveats in the above?
至于目标#2,我可以简单地 trim()
上面的 replaceAll()
之后的字符串,然后检查它是否为空。但是......有更有效的方法吗?
As for goal #2, I could simply trim()
the string after the above replaceAll()
, then check if it's empty. But... Is there a more efficient way to do this?
推荐答案
\ W
相当于 [^ \w]
和 \w
相当于 [a-zA-Z_0-9]
。使用 \W
将替换所有,这不是字母,数字或下划线—喜欢标签和换行符。这个问题是否真的取决于你。
\W
is equivalent to [^\w]
, and \w
is equivalent to [a-zA-Z_0-9]
. Using \W
will replace everything which isn't a letter, a number, or an underscore — like tabs and newline characters. Whether or not that's a problem is really up to you.
在这种情况下,您可能希望使用省略标点符号的字符类;类似
In that case, you might want to use a character class which omits punctuation; something like
[^\w.,;:'"]
使用和。
Pattern p = Pattern.compile("\\W");
boolean containsSpecialChars(String string)
{
Matcher m = p.matcher(string);
return m.find();
}
这篇关于java正则表达式过滤掉非英文文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!