当前位置：

首页
/
IT
/
前端
/
javascript
/
JS: RegExp(正则表达式)

JS: RegExp(正则表达式)

RegExp 语法(包含 ES2018 标准)

注意: 本次所有代码都仅在 Chrome 70 中进行测试

正则表达式是什么?

正则表达式是用于匹配字符串中字符组合的模式.(mdn)

简单来说, 正则表达式是用来提取, 捕获文本 (匹配字符) 的.

创建:

字面量: let regex = / pattern / flags

let regex1 = /foo/i;

构造函数: let regex = new RegExp(pattern, falgs);

let regex2 = new RegExp('bar', ig); // ES5
let regex3 = new RegExp(/bat/im); // ES5
let regex4 = new RegExp(/cat/ig, 'g'); // ES6
/* regex4 创建方法在 ES5 中会抛出 TypeError, 因为第一个参数已经是一个正则表达式, 而 ES5 不允许此时再使用第二个参数添加修饰符. ES6 则允许这种写法, 但第二个参数会作为修饰符覆盖第一个参数中的修饰符.*/
console.log(regex4); // /cat/g

实例属性:

每个正则表达式实例都拥有下面的属性, 以便获取实例模式的信息.

global: 布尔值, 表示是否设置了 g(全局匹配)标志.

ignoreCase: 布尔值, 表示是否设置了 i(忽略大小写)标志.

multiline: 布尔值, 表示是否设置了 m(多行)标志.

unicode: 布尔值, 表示是否设置了 u(识别 unicode 字符中大于 \ uFFFF 的 unicode 字符)标志.

sticky: 布尔值, 表示是否设置了 y(粘连)标志.

lastIndex: 上次成功匹配后的索引位置, 会成为下次匹配的开始索引位置, 只在全局匹配或粘滞匹配模式下可用.

source: 正则表达式中 pattern (模式)的字符串表示, 与调用 toString()或者 valueOf()方法得到的结果并不一样.

flags: 返回正则表达式中 flags(修饰符)的字符串表示.

dotAll: 返回一个布尔值, 表示是否设置了 s(dotAll)标志.

let str2 = 'batfoocat';
let pattern2 = /at/g;
pattern2.global;  // true
pattern2.sticky;  // false
pattern2.source; // at
pattern2.flags; // g
pattern2.toString(); // /at/g
pattern2.valueOf(); // /at/g
pattern2.lastIndex; // 0
let matches = pattern2.exec(str2); // 第一次
matches[0]; // at
matches.index; // 1
pattern2.lastIndex; // 3
matches = pattern2.exec(str2); // 第二次
matches[0]; // at
matches.index; // 7
pattern2.lastIndex; // 9
/* 第三次会出现报错, 是因为已经没有匹配项了, exec()方法返回了 null, 再执行第四次就会返回第一次匹配的结果, 即重新开始匹配 */
matches = pattern2.exec(str2); // 第三次
matches[0]; // error
matches.index); // error
pattern2.lastIndex; // 0

补充: 已经废弃的属性(https://developer.mozilla.org/zh-CN/docs/web/JavaScript/Reference/Deprecated_and_obsolete_features)

这些废弃的特性仍然可以使用, 但你要保持谨慎, 因为它们很可能会在未来的某个时候被删除.(mdn)

方法:

exec: 在指定字符串中进行匹配字符, 每次只会返回一个匹配项的信息.

匹配成功, 则返回一个数组, 并更新正则表达式实例的属性, 否则返回 null.

返回的数组是 Arrary 实例, 但包含了两个属性: index(匹配项在字符串中的位置)和 input(正则表达式进行匹配的字符串), 数组第一项 (下标 0) 存放匹配到的文本.

注意: 如果使用了全局匹配 (g), 再次使用 exec() 方法会返回第二个匹配项的信息, 否则无论使用多少次 exec()方法都只会返回第一个匹配项信息.

补充: ES2018 在返回数组中新增了一个属性 groups(命名捕获组的信息)

let str1 = 'batfoocat';
let pattern1 = /at/g;
pattern1.exec(str1); // 第一次
// ["at", index: 1, input: "batfoocat", groups: undefined]
pattern1.exec(str1); // 第二次
// ["at", index: 7, input: "batfoocat", groups: undefined]
pattern1.exec(str1); // 第三次
// null
// 第四次会重新开始匹配, 即返回第一次匹配的结果

test(): 测试当前正则表达式是否能匹配目标字符串, 返回布尔值.

let str3 = 'batfoocat';
let str4 = 'abcde';
let pattern3 = /at/g;
pattern3.test(str3); // true
pattern3.test(str4); // false

String.prototype.search(): 检索与正则表达式相匹配的子字符串, 匹配成功返回第一个匹配项在字符串中的下标, 否则返回 - 1.

let str5 = 'abcdea';
str5.search(/a/g); // 0
str5.search(/f/g); // -1

String.prototype.match(): 检索与正则表达式相匹配的子字符串, 匹配成功返回一个存放所有匹配项的数组, 否则返回 null, 如果正则表达式中没有标志 g(全局标志), 那么 match()方法就只能执行一次匹配.

注意: 在全局检索模式下, match() 即不提供与子表达式匹配的文本的信息, 也不声明每个匹配子串的位置. 如果需要这些全局检索的信息, 可以使用 RegExp.exec().

let str6 = 'abcdea';
str6.match(/a/g);
// ["a", "a"]
str6.match(/a/);
// ["a", index: 0, input: "abcdea", groups: undefined]
str6.match(/f/g);
// null

String.prototype.replace(regexp, replacement): 替换一个与正则表达式匹配的子串.

let str7 = 'batfoocat';
let a = str7.replace(/at/g, 'oo');
// "boofoocoo"
let b = str7.replace(/at/, 'oo');
// "boofoocat"
let c = str7.replace(/at/g, (value)=> {
    return  '!' + value;
});
// "b!atfooc!at"

String.prototype.split(separator [, howmany]): 把一个字符串分割成字符串数组, 第二个参数为可选, 该参数可指定返回的数组的长度, 不填则返回所有.

let str8 = 'batfoocat';
let a = str8.split(/at/g); // ["b", "fooc", ""]
let b = str8.split(/at/); // ["b", "fooc", ""]
let c = str8.split(/at/, 2); // ["b", "fooc"]

修饰符(标志 - flags):

g: 全局匹配, 找到所有匹配, 而不是在发现第一个匹配项后立即停止.

let str9 = 'batfoocat';
str9.match(/at/);
// ["at", index: 1, input: "batfoocat", groups: undefined]
str9.match(/at/g);
// ["at", "at"]

i: 忽略大小写.

let str10 = 'AabbccDD';
str10.match(/a/gi); // ["A", "a"]
str10.match(/a/g); // ["a"]
str10.match(/A/g); // ["A"]

m: 执行多行匹配, 和 ^ 和 $ 搭配起来使用.

多行; 将开始和结束字符 (^ 和 $) 视为在多行上工作(也就是, 分别匹配每一行的开始和结束(由 \n 或 \r 分割), 而不只是只匹配整个输入字符串的最开始和最末尾处.(mdn)

`
abc
def
`.match(/def/);
// ["def", index: 5, input: "abcdef", groups: undefined]
`
abc
def
`.match(/def/m);
// ["def", index: 5, input: "abcdef", groups: undefined]
`
abc
def
`.match(/^def$/);
// null
`
abc
def
`.match(/^defc$/m);
// ["def", index: 5, input: "abcdef", groups: undefined]

u:Unicode 模式, 可以正确处理码点大于 \ uFFFF 的 Unicode 字符.

/\u{
	20BB7	
}/.test(''); // false
/\u{20BB7}/u.test(''); // true
''.match(/./);
// ["", index: 0, input: "", groups: undefined]
''.match(/./u);
// ["", index: 0, input: "", groups: undefined]

补充: 使用 u 修饰符后, 所有量词都会正确识别码点大于 0xFFFF 的 Unicode 字符.

/{
	2	
}/.test('') // false
/{2}/u.test('') // true

y: 与 g 一样是全局匹配, 但存在粘性匹配特点, 即每次都从 lastIndex 位置开始新的匹配.

粘性匹配, 仅匹配目标字符串中此正则表达式的 lastIndex 属性指示的索引(并且不尝试从任何后续的索引匹配).(mdn)

let str11 = 'batcatdat';
str11.match(/at/g);
// ["at", "at", "at"]
str11.match(/at/y);
// null
/* 初始 lastIndex 为 0, 所以 y 的粘连让正则表达式从 str11 索引值为 0 的 b 开始匹配, 不符合正则表达式中要匹配的 at, 所以匹配失败, 返回 null*/
str11.match(/at/gy);
// null
str11.match(/\wat/y);
// ["bat", index: 0, input: "batcatdat", groups: undefined]
str11.match(/\wat/gy);
// ["bat", "cat", "dat"]

s:dotAll 模式, 和. 搭配使用, ES2018 新增特性.

正则表达式中,. 是代表任意的单个字符, 但有两种字符是无法匹配的: 一个是四个字节的 UTF-16 字符(ES6 通过引入 u 修饰符解决), 另一个是行终止符(即表示一行的终结, 例如回车符 \r, 换行符 \ n 等). 为了解决这个问题, ES2018 引入了 s 修饰符.

'bat\ncat'.match(/bat\ncat/);
// ["batcat"]
'bat\ncat'.match(/bat.cat/);
// null
'bat\ncat'.match(/bat.cat/s);
// ["batcat", index: 0, input: "batcat", groups: undefined]

行结束符:\n \r \u2028 或 \u2029.(mdn)

转义

如果正则表达式的匹配模式里有元字符:( [ { ^ $ | ? * + . } ] ), 需要使用反斜杠 \ 进行转义才能进行正常的匹配.

/.*?/.exec('question?');
// ["", index: 0, input:"question?", groups: undefined]
/.*\?/.exec('question?');
// ["question?", index: 0, input: "question?", groups: undefined]

元字符

边界

边界	含义
^	匹配输入开始，即如果 ^ 作为正则表达式的第一个符号，那在 ^ 后面的字符必须是被匹配文本（即被正则表达式匹配的原始字符串）的第一个字符
$	匹配输入结束，即如果 $ 作为正则表达式的最后一个符号，那在 $ 前面的字符必须是被匹配文本的最后一个字符
\b	见下表
\b	见下表

注意: 边界指的是匹配的不是字符而是一个位置.

'abcde'.match(/^abc/);
// ["abc", index: 0, input: "abcde", groups: undefined]
'fabcde'.match(/^abc/);
// null
'abcde'.match(/e$/);
// ["e", index: 4, input: "abcde", groups: undefined]
'abcdef'.match(/e$/);
// null

带反斜杠 \ 的常用元字符

元字符	含义
\b	匹配一个单词边界
\B	匹配一个非单词边界
\d	匹配一个阿拉伯数字字符，等价于 [0-9]
\D	匹配一个非阿拉伯数字字符，等价于 [^0-9]
\s	匹配一个空白符
\S	匹配一个非空白符
\w	匹配一个字母或者数字或者下划线，等价于 [A-Za-z0-9_]
\W	匹配一个字母、数字、下划线以外的字符，等价于 [^A-Za-z0-9_]

可以看得出来, 大写与小写各代表的意思是相反的.

注意 1: 除 \ b,\B 外, 其余三个元字符将大小写放在一起, 可以匹配任意字符.

'a b'.match(/[\s\S]/g);
// ["a", "","b"]
'a b'.match(/[\W\w]/g);
// ["a", "","b"]
'a b'.match(/[\D\d]/g);
// ["a", "","b"]
'a b'.match(/[\B\b]/g);
// null

注意 2:\b 对中文是无效的.

'The future is in our own hands'.match(/\bfuture\b/);
// ["future", index: 4, input: "The future is in our own hands", groups: undefined]
'你好 我好 大家好'.match(/\b 好 \ b/g);
// null

注意 3:\s 用于匹配空白符, 而空白符包含下列所有字符, 而这些空白符自身也是_元字符_, 可以用于正则表达式中.

' '空格符 (space character - 就是一个空格)

\t 水平制表符 (tab character)

\r 回车符 (carriage return character)

\n 换行符 (new line character)

\v 垂直制表符 (vertical tab character)

\f 换页符 (form feed character)

'a b'.match(/\w\s\w/);
// ["a b", index: 0, input: "a b", groups: undefined]
'a b'.match(/\w \w/);
// ["a b", index: 0, input: "a b", groups: undefined]
`
a
b
`.match(/\w\n\w/);
// ["ab", index: 1, input: "ab", groups: undefined]

点(.)

. 可以匹配任意单个字符, 但有两种字符是无法匹配的: 一个是四个字节的 UTF-16 字符(ES6 通过引入 u 修饰符解决), 另一个是行结束符(ES2018 引入了 s 修饰符解决). 而且在字符集中,. 失去其特殊含义, 并匹配一个真正的. 字符.

'$@hhhh'.match(/.*/);
// ["$@hhhh", index: 0, input: "$@hhhh", groups: undefined]

量词

量词	含义
?	匹配零次或者一次
+	匹配一次或者多次
*****	匹配零次或者多次
{n}	匹配 n 次
{n,}	至少匹配 n 次，即匹配大于或等于 n 次
{n,m}	匹配 n 次到 m 次之间的次数，包含 n 次和 m 次，即匹配 x 次（n<= x && x<=m）
x\|y	匹配 x 或者 y

注意 1: 正则表达式使用量词匹配字符的话, 会匹配尽可能多的字符, 即正则默认具有贪婪模式, 如果要匹配尽可能少的字符, 可以在量词后面加上? 取消贪婪模式.

'$@hhhh'.match(/.+/); // 贪婪模式
// ["$@hhhh", index: 0, input: "$@hhhh", groups: undefined]
'$@hhhh'.match(/.+?/); // 懒惰模式
// ["$", index: 0, input: "$@hhhh", groups: undefined]

注意 2:{n,m}等几个使用大括号的, 大括号里面不能有空格.

'$@hhhh'.match(/.{
	1,3	
}/); // 没有空格
// ["$@h", index: 0, input: "$@hhhh", groups: undefined]
// '$@hhhh'.match(/.{
	1, 3	
}/); // 有空格
null

注意 3: 量词后除可以加? 用来取消贪婪模式外, 不能加任何量词.

'$@hhhh'.match(/.{
	1,3	
}+/);
// Uncaught SyntaxError

字符组(集合, 分组)

字符组	含义
[xyz]	一个字符组
[^xyz]	一个反义的字符组
(xyz)	一组字符集，即圆括号内的字符是一个整体

/* [abc] 里面的 a,b,c 只是作为正则表达式匹配字符时的可选项,[abc]只会匹配一个字符, 除非使用修饰符 g.*/
'abc'.match(/[abc]/);
// ["a", index: 0, input: "abc", groups: undefined]
'abc'.match(/[abc]/g);
// ["a", "b", "c"]
'abc'.match(/[ae]/);
// ["a", index: 0, input: "abc", groups: undefined]
'abc'.match(/[^ae]/);
// ["b", index: 1, input: "abc", groups: undefined]
/* (abc) 里面的字符则是一个整体,(abc) 会匹配 abc 并且捕获匹配项.*/
'abc'.match(/(abc)/);
// ["abc", "abc", index: 0, input: "abc", groups: undefined]
'abc'.match(/(abf)/);
null
'ab c ab ab'.match(/(ab)+/g);
// ["ab", "ab", "ab"]

字符组之间可以使用连字符 -.

'abc123'.match(/[0-9]/);
// ["1", index: 3, input: "abc123", groups: undefined]
'abc123'.match(/[a-z]/);
// ["a", index: 0, input: "abc123", groups: undefined]
'abc123'.match(/[0-z]*/);
// ["abc123", index: 0, input: "abc123", groups: undefined]
/* 数字与英文字母之间也可以使用连字符.*/

捕获组与非捕获组

捕获

上面的 (xyz) 提到了 () 会捕获匹配项, 是因为使用了 (),JavaScript 的正则就会默认为它是捕获组, 从而将() 内的表达式匹配的内容捕获, 并将捕获到的内容保存到内存中以数字命名的组里(ES2018 新增了捕获命名), 而这些保存的内容可以被引用, 这就是反向引用.

/* 在正则表达式内部引用捕获项, 使用 \ 数字.*/
'<a>example.com</a>'.match(/<(a)>.*<\/\1>/);
// ["<a>example.com</a>", "a", index: 0, input: "<a>example.com</a>", groups: undefined]

当有多个捕获组时, 数字命名是从左到右, 从外往内增大的:

'abc_d_e_d_abc'.match(/((a)(b(c))).*(d)/);
/* ["abc_d_e_d", "abc", "a", "bc", "c", "d", index: 0, input: "abc_d_e_d_abc", groups: undefined]
\1 = abc
\2 = a
\3 = bc
\4 = c
\5 = d
 在正则表达式外部也是可以引用捕获项的.
*/
RegExp.$1;
// "abc"
RegExp.$2;
// "a"
RegExp.$3;
// "bc"
RegExp.$4;
// "c"
RegExp.$5;
// "d"

注意: 在正则外部的引用是使用正则 RegExp 的构造函数属性来获取的, 但这些构造函数属性已经被废弃.

这些废弃的特性仍然可以使用, 但你要保持谨慎, 因为它们很可能会在未来的某个时候被删除. (mdn)

补充: 已经废弃的属性(https://developer.mozilla.org/zh-CN/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features)

非捕获

在很多时候其实并不会引用捕获项, 所以可以在 () 中加?: 来取消捕获匹配项, 以免造成内存的浪费.

'batfoocat'.match(/(bat).*(?:cat)/);
// ["batfoocat", "bat", index: 0, input: "batfoocat", groups: undefined]
/* 返回的数组里并没有 cat 的捕获项 */

捕获命名

ES2018 引入了捕获命名, 在 () 内加上?<name > 就可以命名捕获组名, 可以通过返回数组的 groups 属性获取.

'batfoocat'.match(/(?<name_at>bat)/);
// ["bat", "bat", index: 0, input: "batfoocat", groups: {
	name_at: "bat"	
}]
/* 不可以将命名放在匹配字符后面 */
'batfoocat'.match(/(bat?<name_at>)/);
// null

零宽断言

零宽: 仅仅匹配位置, 并不作为结果返回.

断言: 判断, 可以理解为布尔值, 判断真假.

ES2018 引入了零宽后行断言.

零宽断言	含义
x(?=y)	零宽肯定先行断言，即只有当 x 后面跟着 y 才匹配 x
x(?!y)	零宽否定先行断言，即只有 x 后面没有跟着 y 才匹配 x
(?<=y)x	零宽肯定后行断言，即只有 x 前面有 y 才匹配 x
(?<!y)x	零宽否定后行断言，即只有 x 前面没有 y 才匹配 x

// 零宽肯定先行断言
'1% 20'.match(/\d+(?=%)/);
// ["1", index: 0, input: "1% 20", groups: undefined]
// 零宽否定先行断言
'1% 20'.match(/\d+(?!%)/);
// ["20", index: 3, input: "1% 20", groups: undefined]
// 零宽肯定后行断言
'price: $1 ￥6'.match(/(?<=\$)\d+/);
// ["1", index: 8, input: "price: $1 ￥6", groups: undefined]
// 零宽否定后行断言
'price: $1 ￥6'.match(/(?<!\$)\d+/);
// ["6", index: 11, input: "price: $1 ￥6", groups: undefined]

注意: 零宽断言语法中括号里面的内容并不会被作为结果返回.

小声 bb: 这断言的名字真是一言难尽, 可能这就是官方术语吧.

运算符优先级

运算符（优先级从上往下、从左到右）	含义
\	转义
()，(?:)，(?=)，[]	圆括号和方括号
*，+，?，{n}，{n,}，{n,m}	量词限定符
^，$，\ 任何元字符或任何字符
\|	逻辑或

备注

语法虽然看着不难, 但正则真正用起来感觉还是挺难的, 不过真的很强大.

来源: https://www.cnblogs.com/guolao/p/10004367.html

与本文相关文章

暂无,快来抢沙发吧！