8. 正则表达式必知必会-回溯引用:前后一致匹配
8.1 回溯引用有什么用
const str =
'\
<body>\
<h1>Welcome to my homepage.</h1>\
Content is divided into two sections: <br>\
<h2>Coldfusion</h2>\
Infomation about Macromedia ColdFusion.\
<h2>Wireless</h2>\
Infomation about Bluetooth, 802.11, and more.\
</body>\
';
console.log(str);
const reg = /<h[1-6].*?h[1-6]>/g;
let match;
while ((match = reg.exec(str))) {
console.log(`${match[0]}, index: ${match.index}`);
}
// <h1>Welcome to my homepage.</h1>, index: 6
// <h2>Coldfusion</h2>, index: 80
// <h2>Wireless</h2>, index: 138
上面的正则还是有问题:
// 注意 </body> 前的一行
const str =
'\
<body>\
<h1>Welcome to my homepage.</h1>\
Content is divided into two sections: <br>\
<h2>Coldfusion</h2>\
Infomation about Macromedia ColdFusion.\
<h2>Wireless</h2>\
Infomation about Bluetooth, 802.11, and more.\
<h2>This is not a valid html.</h3>\
</body>\
';
console.log(str);
const reg = /<h[1-6].*?h[1-6]>/g;
let match;
while ((match = reg.exec(str))) {
console.log(`${match[0]}, index: ${match.index}`);
}
// <h1>Welcome to my homepage.</h1>, index: 6
// <h2>Coldfusion</h2>, index: 80
// <h2>Wireless</h2>, index: 138
// <h2>This is not a valid html.</h3>, index: 200 <-- 匹配出了 <h2></h3> 这样的无效 html
出现以上情况的原因是这个模式的第二部分对第一部分毫无所知,想要解决这个问题,要用回溯引用。
8.2 回溯引用匹配
找出重复的单词。
const str = 'This is a block of of text, several words here are are repeated, and and they should be not.';
const reg = /[ ]+(\w+)[ ]+\1/g;
let match;
while ((match = reg.exec(str))) {
console.log(match[0]);
}
// of of
// are are
// and and
- [ ]+ 匹配一个或多个空格
- \w+ 匹配一个或多个字母数字字符
- [ ]+ 匹配随后的一个或多个空格
- \w+ 在括号中,是一个子表达式,不过不是用来重复匹配,而是用于后面引用的
- \1 表示引用前面的子表达式。\1 表示模式里的第一个子表达式,\2 第二个子表达式...
const str =
'\
<body>\
<h1>Welcome to my homepage.</h1>\
Content is divided into two sections: <br>\
<h2>Coldfusion</h2>\
Infomation about Macromedia ColdFusion.\
<h2>Wireless</h2>\
Infomation about Bluetooth, 802.11, and more.\
<h2>This is not a valid html.</h3>\
</body>\
';
const reg = /<h([1-6]).*?h\1>/g;
let match;
while ((match = reg.exec(str))) {
console.log(`${match[0]}, index: ${match.index}`);
}
// <h1>Welcome to my homepage.</h1>, index: 6
// <h2>Coldfusion</h2>, index: 80
// <h2>Wireless</h2>, index: 138
// 不会匹配到 <h2></h3> 这样的无效标签了
8.3 回溯引用在替换操作中的应用
JavaScript 中替换时用 $1 回溯引用。
const str = 'Hello, lwl@163.com is my email.';
const reg = /(\w+[\w\.]*@[\w\.]+\.\w+)/;
const str2 = str.replace(reg, '<a href="$1">$1</a>');
console.log(str2); // Hello, <a href="lwl@163.com">lwl@163.com</a> is my email.
将 313-555-1234 替换成 (313)555-1234 这样的形式。
const str = '313-555-1234 121-444-2344 211-133-4453';
const reg = /(\d{3})(-)(\d{3})(-)(\d{4})/g;
const str2 = str.replace(reg, '($1)$3$4$5');
console.log(str2); // (313)555-1234 (121)444-2344 (211)133-4453
大小写转换
- \E:结束 \L 或 \U 转换
- \l:把下一个字符转换成小写
- \L:把 \L 和 \E 之间的字符转换成小写
- \u:把下一个字符转换成大写
- \U:把 \U 和 \E 之间的字符转换成大写
把下面 <h1></h1>
之间的文字转换成大写。
JavaScript 不支持 \U \E 语法。
const str =
'\
<body>\
<h1>Welcome to my homepage.</h1>\
Content is divided into two sections: <br>\
<h2>Coldfusion</h2>\
Infomation about Macromedia ColdFusion.\
<h2>Wireless</h2>\
Infomation about Bluetooth, 802.11, and more.\
</body>\
';
const reg = /(<h1>)(.*?)(<\/h1>)/g;
const str2 = str.replace(reg, val => {
return val.toUpperCase();
});
console.log(str2);
// <body><H1>WELCOME TO MY HOMEPAGE.</H1>Content is divided into two sections: <br><h2>Coldfusion</h2>Infomation about Macromedia ColdFusion.<h2>Wireless</h2>Infomation about Bluetooth, 802.11, and more.</body>