问题描述
在我的情况下,字长是2",我正在使用这个正则表达式:
In my case word length is "2" and I am using this regex:
text = text.replace(/\b[a-zA-ZΆ-ώἀ-ῼ]{2}\b/g, '') );
但不能使其与希腊字符一起使用.为了您的方便,这里有一个演示:
but cannot make it work with greek characters.For your convenience here is a demo:
text = 'English: the on in to of \n Greek: πως θα το πω';
text = text.replace(/\b[0-9a-zA-ZΆ-ώἀ-ῼ]{2}\b/g, '');
console.log(text);
就希腊字符而言,我尝试使用包含 2 组的范围:希腊语和科普特语"和希腊语扩展"(如 unicode-table.com).
As far as the greek characters are concerned, I try to use a range with 2 sets: "Greek and Coptic" and "Greek Extended" (as seen on unicode-table.com).
推荐答案
希腊字符的问题是因为 \b
.您可以在这里查看:Javascript - 正则表达式 - 词边界 (\b) 问题 其中@Casimir et Hippolyte 提出以下解决方案:
The problem with greek characters is because of \b
. You can take a look here: Javascript - regex - word boundary (\b) issue where @Casimir et Hippolyte proposes the following solution:
由于 Javascript 没有后视功能,并且由于单词边界仅适用于 \w 字符类的成员,因此唯一的方法是使用组(如果要替换,则捕获组):
//example to remove 2 letter words:
txt = txt.replace(/(^|[^a-zA-ZΆΈ-ώἀ-ῼ\n])([a-zA-ZΆΈ-ώἀ-ῼ]{2})(?![a-zA-ZΆΈ-ώἀ-ῼ])/gm, '\1');
我还在第一个和第三个匹配项中添加了 0-9
因为它删除了诸如2TB"或mp3"之类的词
I also added 0-9
inside the first and the third match because it was removing words like "2TB" or "mp3"
这篇关于Javascript - 正则表达式 - 如何删除指定长度的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!