JS: RegExp（正则表达式）

RegExp语法（包含ES2018标准）

注意：本次所有代码都仅在Chrome 70中进行测试

正则表达式是什么？
简单来说，正则表达式是用来提取、捕获文本（匹配字符）的。

创建：

字面量：let regex = / pattern / flags
```
let regex1 = /foo/i;
```

构造函数：let regex = new RegExp(pattern, falgs);

let regex2 = new RegExp('bar', ig); // ES5
let regex3 = new RegExp(/bat/im); // ES5
let regex4 = new RegExp(/cat/ig, 'g'); // ES6
/* regex4 创建方法在ES5中会抛出TypeError,因为第一个参数已经是一个正则表达式，而ES5不允许此时再使用第二个参数添加修饰符。ES6则允许这种写法，但第二个参数会作为修饰符覆盖第一个参数中的修饰符。*/
console.log(regex4); // /cat/g

实例属性：
每个正则表达式实例都拥有下面的属性，以便获取实例模式的信息。
- global：布尔值，表示是否设置了 g（全局匹配）标志。
- ignoreCase：布尔值，表示是否设置了 i（忽略大小写）标志。
- multiline：布尔值，表示是否设置了 m（多行）标志。
- unicode：布尔值，表示是否设置了 u（识别 unicode 字符中大于\uFFFF的 unicode 字符）标志。
- sticky：布尔值，表示是否设置了 y（粘连）标志。
- lastIndex：上次成功匹配后的索引位置，会成为下次匹配的开始索引位置，只在全局匹配或粘滞匹配模式下可用。
- source：正则表达式中pattern （模式）的字符串表示，与调用toString()或者valueOf()方法得到的结果并不一样。
- flags：返回正则表达式中flags（修饰符）的字符串表示。
- dotAll：返回一个布尔值，表示是否设置了 s（dotAll）标志。
```
let str2 = 'batfoocat';
let pattern2 = /at/g;

pattern2.global;  // true
pattern2.sticky;  // false
pattern2.source; // at
pattern2.flags; // g
pattern2.toString(); // /at/g
pattern2.valueOf(); // /at/g
pattern2.lastIndex; // 0

let matches = pattern2.exec(str2); // 第一次
matches[0]; // at
matches.index; // 1
pattern2.lastIndex; // 3

matches = pattern2.exec(str2); // 第二次
matches[0]; // at
matches.index; // 7
pattern2.lastIndex; // 9

/*第三次会出现报错，是因为已经没有匹配项了，exec()方法返回了null，再执行第四次就会返回第一次匹配的结果，即重新开始匹配*/
matches = pattern2.exec(str2); // 第三次
matches[0]; // error
matches.index); // error
pattern2.lastIndex; // 0
```
  补充：已经废弃的属性（https://developer.mozilla.org/zh-CN/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features）
方法：
- exec：在指定字符串中进行匹配字符，每次只会返回一个匹配项的信息。
  返回的数组是 Arrary 实例，但包含了两个属性：index（匹配项在字符串中的位置）和 input（正则表达式进行匹配的字符串），数组第一项（下标0）存放匹配到的文本。
  注意：如果使用了全局匹配（g）,再次使用exec()方法会返回第二个匹配项的信息，否则无论使用多少次exec()方法都只会返回第一个匹配项信息。
  补充：ES2018在返回数组中新增了一个属性groups（命名捕获组的信息）
```
let str1 = 'batfoocat';
let pattern1 = /at/g;
pattern1.exec(str1); // 第一次
// ["at", index: 1, input: "batfoocat", groups: undefined]
pattern1.exec(str1); // 第二次
// ["at", index: 7, input: "batfoocat", groups: undefined]
pattern1.exec(str1); // 第三次
// null
// 第四次会重新开始匹配，即返回第一次匹配的结果
```
- test()：测试当前正则表达式是否能匹配目标字符串，返回布尔值。
```
let str3 = 'batfoocat';
let str4 = 'abcde';
let pattern3 = /at/g;
pattern3.test(str3); // true
pattern3.test(str4); // false
```
- String.prototype.search()：检索与正则表达式相匹配的子字符串，匹配成功返回第一个匹配项在字符串中的下标，否则返回-1。
```
let str5 = 'abcdea';
str5.search(/a/g); // 0
str5.search(/f/g); // -1
```
- String.prototype.match()：检索与正则表达式相匹配的子字符串，匹配成功返回一个存放所有匹配项的数组，否则返回null，如果正则表达式中没有标志 g（全局标志），那么match()方法就只能执行一次匹配。
  注意：在全局检索模式下，match() 即不提供与子表达式匹配的文本的信息，也不声明每个匹配子串的位置。如果需要这些全局检索的信息，可以使用RegExp.exec()。
```
let str6 = 'abcdea';
str6.match(/a/g);
// ["a", "a"]
str6.match(/a/);
// ["a", index: 0, input: "abcdea", groups: undefined]
str6.match(/f/g);
// null
```
- String.prototype.replace(regexp, replacement)：替换一个与正则表达式匹配的子串。
```
let str7 = 'batfoocat';
let a = str7.replace(/at/g, 'oo');
// "boofoocoo"

let b = str7.replace(/at/, 'oo');
// "boofoocat"

let c = str7.replace(/at/g, (value)=> {
    return  '!' + value;
});
// "b!atfooc!at"
```
- String.prototype.split(separator [, howmany])：把一个字符串分割成字符串数组，第二个参数为可选，该参数可指定返回的数组的长度，不填则返回所有。
```
let str8 = 'batfoocat';
let a = str8.split(/at/g); // ["b", "fooc", ""]
let b = str8.split(/at/); // ["b", "fooc", ""]
let c = str8.split(/at/, 2); // ["b", "fooc"]
```

修饰符（标志 - flags）：

g：全局匹配，找到所有匹配，而不是在发现第一个匹配项后立即停止。

let str9 = 'batfoocat';
str9.match(/at/);
// ["at", index: 1, input: "batfoocat", groups: undefined]
str9.match(/at/g);
// ["at", "at"]

i：忽略大小写。

let str10 = 'AabbccDD';
str10.match(/a/gi); // ["A", "a"]
str10.match(/a/g); // ["a"]
str10.match(/A/g); // ["A"]

m：执行多行匹配，和^和$搭配起来使用。

`
abc
def
`.match(/def/);
// ["def", index: 5, input: "↵abc↵def↵", groups: undefined]

`
abc
def
`.match(/def/m);
// ["def", index: 5, input: "↵abc↵def↵", groups: undefined]

`
abc
def
`.match(/^def$/);
// null

`
abc
def
`.match(/^defc$/m);
// ["def", index: 5, input: "↵abc↵def↵", groups: undefined]

u：Unicode 模式，可以正确处理码点大于\uFFFF的 Unicode 字符。

/\u{20BB7}/.test('𠮷'); // false
/\u{20BB7}/u.test('𠮷'); // true
'𠮷'.match(/./);
// ["�", index: 0, input: "𠮷", groups: undefined]
'𠮷'.match(/./u);
// ["𠮷", index: 0, input: "𠮷", groups: undefined]

y：与g一样是全局匹配，但存在粘性匹配特点，即每次都从 lastIndex 位置开始新的匹配。

let str11 = 'batcatdat';
str11.match(/at/g);
// ["at", "at", "at"]

str11.match(/at/y);
// null
/*初始 lastIndex 为0，所以 y 的粘连让正则表达式从 str11 索引值为0的 b 开始匹配，不符合正则表达式中要匹配的 at，所以匹配失败，返回null*/

str11.match(/at/gy);
// null
str11.match(/\wat/y);
// ["bat", index: 0, input: "batcatdat", groups: undefined]
str11.match(/\wat/gy);
// ["bat", "cat", "dat"]

s：dotAll 模式，和.搭配使用，ES2018新增特性。
正则表达式中，.是代表任意的单个字符，但有两种字符是无法匹配的：一个是四个字节的 UTF-16 字符（ES6通过引入u修饰符解决），另一个是行终止符（即表示一行的终结，例如回车符 \r、换行符\n等）。为了解决这个问题，ES2018引入了s修饰符。
```
'bat\ncat'.match(/bat\ncat/);
// ["bat↵cat"]
'bat\ncat'.match(/bat.cat/);
// null
'bat\ncat'.match(/bat.cat/s);
// ["bat↵cat", index: 0, input: "bat↵cat", groups: undefined]
```

转义

如果正则表达式的匹配模式里有元字符：( [ { ^ $ | ? * + . } ] )，需要使用反斜杠\进行转义才能进行正常的匹配。

/.*?/.exec('question?');
// ["", index: 0, input: "question?", groups: undefined]

/.*\?/.exec('question?');
// ["question?", index: 0, input: "question?", groups: undefined]

元字符

边界

注意：边界指的是匹配的不是字符而是一个位置。

'abcde'.match(/^abc/);
// ["abc", index: 0, input: "abcde", groups: undefined]
'fabcde'.match(/^abc/);
// null

'abcde'.match(/e$/);
// ["e", index: 4, input: "abcde", groups: undefined]
'abcdef'.match(/e$/);
// null

带反斜杠\的常用元字符
可以看得出来，大写与小写各代表的意思是相反的。
注意1：除\b、\B外，其余三个元字符将大小写放在一起，可以匹配任意字符。
```
'a b'.match(/[\s\S]/g);
// ["a", " ", "b"]
'a b'.match(/[\W\w]/g);
// ["a", " ", "b"]
'a b'.match(/[\D\d]/g);
// ["a", " ", "b"]
'a b'.match(/[\B\b]/g);
// null
```
注意2：\b对中文是无效的。
```
'The future is in our own hands'.match(/\bfuture\b/);
// ["future", index: 4, input: "The future is in our own hands", groups: undefined]
'你好 我好 大家好'.match(/\b好\b/g);
// null
```
注意3：\s用于匹配空白符，而空白符包含下列所有字符，而这些空白符自身也是_元字符_，可以用于正则表达式中。
- ' '空格符 (space character - 就是一个空格)
- \t水平制表符 (tab character)
- \r回车符 (carriage return character)
- \n换行符 (new line character)
- \v垂直制表符 (vertical tab character)
- \f换页符 (form feed character)
```
'a b'.match(/\w\s\w/);
// ["a b", index: 0, input: "a b", groups: undefined]

'a b'.match(/\w \w/);
// ["a b", index: 0, input: "a b", groups: undefined]

`
a
b
`.match(/\w\n\w/);
// ["a↵b", index: 1, input: "↵a↵b↵", groups: undefined]
```
点（.）
.可以匹配任意单个字符，但有两种字符是无法匹配的：一个是四个字节的 UTF-16 字符（ES6通过引入u修饰符解决），另一个是行结束符（ES2018引入了s修饰符解决）。而且在字符集中，.失去其特殊含义，并匹配一个真正的.字符。
```
'$@hhhh'.match(/.*/);
// ["$@hhhh", index: 0, input: "$@hhhh", groups: undefined]
```

量词

注意1：正则表达式使用量词匹配字符的话，会匹配尽可能多的字符，即正则默认具有贪婪模式，如果要匹配尽可能少的字符，可以在量词后面加上?取消贪婪模式。

'$@hhhh'.match(/.+/); // 贪婪模式
// ["$@hhhh", index: 0, input: "$@hhhh", groups: undefined]

'$@hhhh'.match(/.+?/); // 懒惰模式
// ["$", index: 0, input: "$@hhhh", groups: undefined]

注意2：{n,m}等几个使用大括号的，大括号里面不能有空格。

'$@hhhh'.match(/.{1,3}/); // 没有空格
// ["$@h", index: 0, input: "$@hhhh", groups: undefined]

// '$@hhhh'.match(/.{1, 3}/); // 有空格
null

注意3：量词后除可以加?用来取消贪婪模式外，不能加任何量词。

'$@hhhh'.match(/.{1,3}+/);
// Uncaught SyntaxError

字符组（集合、分组）

/* [abc] 里面的 a、b、c 只是作为正则表达式匹配字符时的可选项，[abc]只会匹配一个字符，除非使用修饰符 g。*/
'abc'.match(/[abc]/);
// ["a", index: 0, input: "abc", groups: undefined]
'abc'.match(/[abc]/g);
// ["a", "b", "c"]
'abc'.match(/[ae]/);
// ["a", index: 0, input: "abc", groups: undefined]

'abc'.match(/[^ae]/);
// ["b", index: 1, input: "abc", groups: undefined]

/* (abc) 里面的字符则是一个整体，(abc) 会匹配 abc 并且捕获匹配项。*/
'abc'.match(/(abc)/);
// ["abc", "abc", index: 0, input: "abc", groups: undefined]
'abc'.match(/(abf)/);
null
'ab c ab ab'.match(/(ab)+/g);
// ["ab", "ab", "ab"]

字符组之间可以使用连字符-。

'abc123'.match(/[0-9]/);
// ["1", index: 3, input: "abc123", groups: undefined]

'abc123'.match(/[a-z]/);
// ["a", index: 0, input: "abc123", groups: undefined]

'abc123'.match(/[0-z]*/);
// ["abc123", index: 0, input: "abc123", groups: undefined]
/*数字与英文字母之间也可以使用连字符。*/

捕获组与非捕获组
- 捕获
  上面的(xyz)提到了()会捕获匹配项，是因为使用了()，JavaScript的正则就会默认为它是捕获组，从而将()内的表达式匹配的内容捕获，并将捕获到的内容保存到内存中以数字命名的组里（ES2018新增了捕获命名），而这些保存的内容可以被引用，这就是反向引用。
```
/*在正则表达式内部引用捕获项，使用 \数字。*/
'<a>example.com</a>'.match(/<(a)>.*<\/\1>/);
// ["<a>example.com</a>", "a", index: 0, input: "<a>example.com</a>", groups: undefined]
```
  当有多个捕获组时，数字命名是从左到右、从外往内增大的：
```
'abc_d_e_d_abc'.match(/((a)(b(c))).*(d)/);
/* ["abc_d_e_d", "abc", "a", "bc", "c", "d", index: 0, input: "abc_d_e_d_abc", groups: undefined]

\1 = abc
\2 = a
\3 = bc
\4 = c
\5 = d

在正则表达式外部也是可以引用捕获项的。
*/

RegExp.$1;
// "abc"
RegExp.$2;
// "a"
RegExp.$3;
// "bc"
RegExp.$4;
// "c"
RegExp.$5;
// "d"
```
  注意：在正则外部的引用是使用正则RegExp的构造函数属性来获取的，但这些构造函数属性已经被废弃。
  补充：已经废弃的属性（https://developer.mozilla.org/zh-CN/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features）
- 非捕获
  在很多时候其实并不会引用捕获项，所以可以在()中加?:来取消捕获匹配项，以免造成内存的浪费。
```
'batfoocat'.match(/(bat).*(?:cat)/);
// ["batfoocat", "bat", index: 0, input: "batfoocat", groups: undefined]
/*返回的数组里并没有 cat 的捕获项*/
```
- 捕获命名
  ES2018引入了捕获命名，在()内加上?<name>就可以命名捕获组名，可以通过返回数组的groups属性获取。
```
'batfoocat'.match(/(?<name_at>bat)/);
// ["bat", "bat", index: 0, input: "batfoocat", groups: {name_at: "bat"}]

/*不可以将命名放在匹配字符后面*/
'batfoocat'.match(/(bat?<name_at>)/);
// null
```

零宽断言

零宽：仅仅匹配位置，并不作为结果返回。

断言：判断，可以理解为布尔值，判断真假。

ES2018引入了零宽后行断言。

// 零宽肯定先行断言
'1% 20'.match(/\d+(?=%)/);
// ["1", index: 0, input: "1% 20", groups: undefined]

// 零宽否定先行断言
'1% 20'.match(/\d+(?!%)/);
// ["20", index: 3, input: "1% 20", groups: undefined]

// 零宽肯定后行断言
'price: $1 ￥6'.match(/(?<=\$)\d+/);
// ["1", index: 8, input: "price: $1 ￥6", groups: undefined]

// 零宽否定后行断言
'price: $1 ￥6'.match(/(?<!\$)\d+/);
// ["6", index: 11, input: "price: $1 ￥6", groups: undefined]

注意：零宽断言语法中括号里面的内容并不会被作为结果返回。

小声bb：这断言的名字真是一言难尽，可能这就是官方术语吧。

运算符优先级

备注

语法虽然看着不难，但正则真正用起来感觉还是挺难的，不过真的很强大。

郭佬

JS: RegExp（正则表达式）

RegExp语法（包含ES2018标准）

备注