2、正则表达式中的断言和边界

范先生2000/1/2大约 4 分钟

正则表达式中的断言和边界

断言和边界在正则表达式中是一种特殊的结构，它们不会消耗字符，而是对字符串中的位置进行断言。这些结构可以帮助你更精确地控制匹配的位置，本文将详细介绍它们的用法。

边界断言

1. `^` - 行或字符串的开始

^ 符号用于匹配字符串的开始位置：

const regex = /^hello/;

console.log(regex.test('hello world')); // true - 'hello' 在字符串开始
console.log(regex.test('say hello')); // false - 'hello' 不在字符串开始

2. `$` - 行或字符串的结束

$ 符号用于匹配字符串的结束位置：

const regex = /world$/;

console.log(regex.test('hello world')); // true - 'world' 在字符串结束
console.log(regex.test('world is big')); // false - 'world' 不在字符串结束

3. `\b` - 单词边界

\b 用于匹配单词的开始或结束位置（单词边界是指 \w 字符与 \W 字符之间的位置，或者 \w 字符与字符串开头/结尾之间的位置）：

const regex = /\bcat\b/; // 匹配独立的 'cat' 单词

console.log(regex.test('The cat is cute')); // true
console.log(regex.test('The cats are cute')); // false - 'cat' 后面有 's'
console.log(regex.test('I like catsup')); // false - 'cat' 不是独立的单词

4. `\B` - 非单词边界

\B 用于匹配非单词边界的位置：

const regex = /\Bcat\B/; // 匹配包含在单词中间的 'cat'

console.log(regex.test('The cat is cute')); // false - 'cat' 是完整单词
console.log(regex.test('The cats are cute')); // false - 'cat' 在单词开头
console.log(regex.test('I like catsup')); // true - 'cat' 在 'catsup' 中间

前瞻断言和后顾断言

前瞻断言和后顾断言允许你基于接下来要匹配的内容或之前已经匹配的内容来判断当前位置是否匹配，而不会将这些内容包含在实际匹配结果中。

1. 正向前瞻断言 `(?=...)`

匹配后面跟着特定模式的位置：

const regex = /\w+(?=\s*,)/g; // 匹配后面跟着逗号的单词

const text = 'apple, banana, orange';
console.log(text.match(regex)); // ['apple', 'banana']

2. 负向前瞻断言 `(?!...)`

匹配后面不跟着特定模式的位置：

const regex = /\b\w+\b(?!\W+and)/g; // 匹配后面不跟着 'and' 的单词

const text = 'cats and dogs but not fish';
console.log(text.match(regex)); // ['dogs', 'but', 'not', 'fish']

3. 正向后顾断言 `(?<=...)`

匹配前面有特定模式的位置：

const regex = /(?<=\$)\d+(\.\d+)?/g; // 匹配前面有美元符号的数字

const text = 'The prices are $10.99, €5.99, and $29.99';
console.log(text.match(regex)); // ['10.99', '29.99']

4. 负向后顾断言 `(?<!...)`

匹配前面没有特定模式的位置：

const regex = /(?<!\$)\d+(\.\d+)?/g; // 匹配前面没有美元符号的数字

const text = 'The prices are $10.99, 5.99, and $29.99';
const matches = text.match(regex);
console.log(matches.filter(m => m.indexOf('.') !== -1)); // ['5.99']

断言的实际应用

密码验证

使用前瞻断言进行密码复杂性检查：

// 密码必须至少包含一个大写字母、一个小写字母、一个数字和一个特殊字符
const passwordRegex = /^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/;

console.log(passwordRegex.test('Passw0rd!')); // true
console.log(passwordRegex.test('password')); // false - 缺少大写字母、数字和特殊字符

不包含某些字符串

验证字符串不包含特定子串：

// 确保字符串不包含 'admin' 或 'root'
const regex = /^(?!.*(?:admin|root)).*$/i;

console.log(regex.test('user123')); // true
console.log(regex.test('adminUser')); // false - 包含 'admin'

格式转换

使用后顾断言在不捕获分隔符的情况下拆分字符串：

// 提取以逗号分隔的 CSV 字段，但不包括引号内的逗号
const regex = /(?:^|,)(?:"([^"]*(?:""[^"]*)*)"|((?:[^",][^,]*)|))(?=,|$)/g;

const csvLine = '"John Doe","123 Main St, Apt 4","555-1234"';
let match;
const fields = [];

while ((match = regex.exec(csvLine)) !== null) {
  fields.push(match[1] || match[2]);
}

console.log(fields); // ['John Doe', '123 Main St, Apt 4', '555-1234']

断言的兼容性和限制

值得注意的是，并非所有 JavaScript 环境都支持所有类型的断言：

前瞻断言 ((?=...) 和 (?!...)) 在所有现代浏览器和 Node.js 中都受支持。
后顾断言 ((?<=...) 和 (?<!...)) 在 ECMAScript 2018 (ES9) 及以后的版本中支持。

此外，某些正则表达式引擎（如 JavaScript 的）不支持可变长度的后顾断言。这意味着像 (?<=\d{1,3}) 这样的表达式在某些环境中可能不工作，因为后顾断言的长度是不确定的。

掌握断言和边界可以让你的正则表达式更精确地控制匹配位置，从而实现更复杂和精细的文本处理功能。在下一篇文章中，我们将探讨正则表达式中的分组和捕获技术。