问题描述
我正在尝试使用 NSRegularExpression 进行简单的正则表达式匹配,但是当源包含多字节字符时,我在匹配字符串时遇到了一些问题:
I'm trying to do a simple regex match using NSRegularExpression, but I'm having some problems matching the string when the source contains multibyte characters:
let string = "D 9"
// The following matches (any characters)(SPACE)(numbers)(any characters)
let pattern = "([\\s\\S]*) ([0-9]*)(.*)"
let slen : Int = string.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)
var error: NSError? = nil
var regex = NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions.DotMatchesLineSeparators, error: &error)
var result = regex?.stringByReplacingMatchesInString(string, options: nil, range: NSRange(location:0,
length:slen), withTemplate: "First \"$1\" Second: \"$2\"")
上面的代码按预期返回D"和9"
The code above returns "D" and "9" as expected
如果我现在将第一行更改为包含英国英镑"货币符号,如下所示:
If I now change the first line to include a UK 'Pound' currency symbol as follows:
let string = "£ 9"
然后匹配不起作用,即使表达式的 ([\\s\\S]*)
部分仍应匹配 任何 前导字符.
Then the match doesn't work, even though the ([\\s\\S]*)
part of the expression should still match any leading characters.
我知道 £
符号将占用两个字节,但通配符前导匹配应该忽略那些不应该吗?
I understand that the £
symbol will take two bytes but the wildcard leading match should ignore those shouldn't it?
谁能解释一下这里发生了什么?
Can anyone explain what is going on here please?
推荐答案
这可能会令人困惑.stringByReplacingMatchesInString()
的第一个参数是从 NSString
中映射而来的Objective-C 在 Swift 中转为 String
,但 range:
参数仍然是NSRange
.因此,您必须以单位指定范围NSString
使用的(这是 UTF-16 代码点的数量):
It can be confusing. The first parameter of stringByReplacingMatchesInString()
is mapped from NSString
inObjective-C to String
in Swift, but the range:
parameter is stillan NSRange
. Therefore you have to specify the range in the unitsused by NSString
(which is the number of UTF-16 code points):
var result = regex?.stringByReplacingMatchesInString(string,
options: nil,
range: NSRange(location:0, length:(string as NSString).length),
withTemplate: "First \"$1\" Second: \"$2\"")
或者你可以使用 count(string.utf16)
而不是 (string as NSString).length
.
Alternatively you can use count(string.utf16)
instead of (string as NSString).length
.
完整示例:
let string = "£ 9"
let pattern = "([\\s\\S]*) ([0-9]*)(.*)"
var error: NSError? = nil
let regex = NSRegularExpression(pattern: pattern,
options: NSRegularExpressionOptions.DotMatchesLineSeparators,
error: &error)!
let result = regex.stringByReplacingMatchesInString(string,
options: nil,
range: NSRange(location:0, length:(string as NSString).length),
withTemplate: "First \"$1\" Second: \"$2\"")
println(result)
// First "£" Second: "9"
这篇关于当源包含 unicode 字符时,Swift 正则表达式匹配失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!