Javascript RegExp + Word 边界 + unicode 字符

本文介绍了Javascript RegExp + Word 边界 + unicode 字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在构建搜索，我将使用 javascript 自动完成功能.我来自芬兰(芬兰语)，所以我必须处理一些特殊字符，如 ä、ö 和 å

I am building search and I am going to use javascript autocomplete with it. I am from Finland (finnish language) so I have to deal with some special characters like ä, ö and å

当用户在搜索输入字段中输入文本时，我尝试将文本与数据进行匹配.

When user types text in to the search input field I try to match the text to data.

这是一个简单的例子，如果用户输入例如ää"，它就不能正常工作.与äl"相同的事情

Here is simple example that is not working correctly if user types for example "ää". Same thing with "äl"

var title = "this is simple string with finnish word tämä on ääkköstesti älkää ihmetelkö";
// Does not work
var searchterm = "äl";

// does not work
//var searchterm = "ää";

// Works
//var searchterm = "wi";

if ( new RegExp("\b"+searchterm, "gi").test(title) ) {
    $("#result").html("Match: ("+searchterm+"): "+title);
} else {
    $("#result").html("nothing found with term: "+searchterm);
}

http://jsfiddle.net/7TsxB/

那么我怎样才能让那些 ä、ö 和 å 字符与 javascript 正则表达式一起使用?

So how can I get those ä,ö and å characters to work with javascript regex?

我想我应该使用 unicode 代码，但我应该怎么做?这些字符的代码是:[u00C4,u00E4,u00C5,u00E5,u00D6,u00F6]

I think I should use unicode codes but how should I do that? Codes for those characters are:[u00C4,u00E4,u00C5,u00E5,u00D6,u00F6]

=> äÄåÅöÖ

推荐答案

Regex 出现问题，单词边界匹配字符串的开头，起始字符为 out of正常的 256 字节范围.

There appears to be a problem with Regex and the word boundary matching the beginning of a string with a starting character out of the normal 256 byte range.

不要使用，尝试使用 (?:^|\s)

Instead of using , try using (?:^|\s)

var title = "this is simple string with finnish word tämä on ääkköstesti älkää ihmetelkö";
// Does not work
var searchterm = "äl";

// does not work
//var searchterm = "ää";

// Works
//var searchterm = "wi";

if ( new RegExp("(?:^|\s)"+searchterm, "gi").test(title) ) {
    $("#result").html("Match: ("+searchterm+"): "+title);
} else {
    $("#result").html("nothing found with term: "+searchterm);
}

细分:

(?: 括号 () 在 Regex 中形成一个捕获组.括号以问号和冒号开始，?: 形成一个非- 捕获组.他们只是将术语组合在一起

(?: parenthesis () form a capture group in Regex. Parenthesis started with a question mark and colon ?: form a non-capturing group. They just group the terms together

^ 插入符号匹配字符串的开头

^ the caret symbol matches the beginning of a string

| 横杠是或"运算符.

s 匹配空格(在字符串中显示为 \s 因为我们必须转义反斜杠)

s matches whitespace (appears as \s in the string because we have to escape the backslash)

) 关闭群组

因此，我们不使用匹配单词边界且不适用于 unicode 字符的，而是使用匹配字符串开头或空格的非捕获组.

So instead of using , which matches word boundaries and doesn't work for unicode characters, we use a non-capturing group which matches the beginning of a string OR whitespace.

这篇关于Javascript RegExp + Word 边界 + unicode 字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！