regular-expression

正则表达式和扩展的正则表达式可以称为是很多 Linux 命令行工具 (grep, sed and ...) 的精髓所在。所有本小节将分别介绍:

regular-expression
extended regular-expression

grep的注意事项

In GNU grep there is no difference in available functionality between basic and extended syntaxes.

Basic vs Extended Regular Expressions
In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the
backslashed versions ?, +, {, |, (, and ).

这就意味着，如果需要在 basic grep 中使用上面提到的 meta-characters 的特殊含义，就需要使用 backslash 进行转译；否则他们只表示原有的含义 (比如？只表示文本中的问号。)

abc	Letters
123	Digits
\d	Any Digit
\D	Any Non-digit character
.	Any Character
\.	Period
[abc]	Only a, b, or c
[^abc]	Not a, b, nor c
[a-z]	Characters a to z
[0-9]	Numbers 0 to 9
\w	Any Alphanumeric character
\W	Any Non-alphanumeric character
{m}	m Repetitions
{m,n}	m to n Repetitions
*	Zero or more repetitions
+	One or more repetitions
?	Optional character
\s	Any Whitespace
\S	Any Non-whitespace character
^…$	Starts and ends
(…)	Capture Group
(a(bc))	Capture Sub-group
(.*)	Capture all
(abc\|def)	Matches abc or def
\bwooo\b	whole wooo only

# 匹配数字

使用 0-9 或者 \d 来匹配数字。

# 匹配一个任意字符

使用 . (dot) 来匹配一个任意字符 (letter, digit, whitespace, everything)。

使用 \. 来匹配 . 。

# 匹配指定字符

[abc] 将匹配一个 a，b 或者 c 字符。

# 不匹配指定字符

[^abc] 将不匹配一个 a，b 或者 c 字符。

# 匹配指定范围的一个字符

[0-6] 将匹配一个 1, 2, 3, 4, 5, 6 字符。

# 匹配指定数量的某个字符

a{3} 将匹配 3 个 a 字符。

a{1, 3} 将匹配至少 1 个，至多 3 个 a 字符。

[wxy]{5} 将匹配 5 个字符，每个字符都可以是 w, x, y 中的一个。

.{2,6} 将匹配至少 2 个至多 6 个任意字符。

# Star and Plus

Star and the Plus, which essentially represents either 0 or more or 1 or more of the character that it follows (it always follows a character or group).

这句话中 group 的意思就比如 \d 就是一个 group，它代表了所有的数字，所以 \d+ 表示有可以用来匹配任意的数字。

星号可以用来匹配 0 个或者任意个重复的字符，比如:
a* 可以用来匹配 0 个及以上的字符 a;

加号可以用来匹配 0 个或者任意个重复的字符，比如:
a+ 可以用来匹配 1 个及以上的 a 字符。

# 可选字符

ab?c 可以用来匹配 ab 或者 abc 。表明 b 是可选字符。

# 匹配任意的 whitesapce

whitespace 有 the tab (\t), the new line (\n) and the carriage return (\r).

\s 可以代表任何的 whitesapce

# 指定行的开始和结束字符

^(hat) 指定行首，$(dollar sign) 指定行尾

# 匹配 group

正则表达式不仅能够匹配字符串，还能使用 () 捕获字符串，所有的 () 中捕获的字符串都会变为一个 group，然后可以被进行后续处理。

^(IMG\d+)\.png$ 将所有 () 中匹配到的内容放到一个 group 中 (^ 可以在括号内外)。

# 嵌套 groups

^(IMG(\d+)).png$
第一个括号将所有的照片名称的字母部分及数字放到一个群组中；
第二个嵌套的括号将数组放到一个群组中。

# 多个独立的 groups

(\d+)x(\d+)
可以将分辨率的高和宽分别放到两个组中。

# OR

语法为: (milk|bread|juice)

注意

这里的括号并不表示 group.

You can use any sequence of characters or metacharacters in a condition.

# Metacharacters

\d : 表示任意一个 digitals 字符
\D : 表示任意一个非 digitals 字符

\s : 表示任意一个 space (\n, \r...) 字符
\S : 表示任意一个非 space (\n, \r...) 字符

\w : 表示任意一个字母字符
\W : 表示任意一个非字母字符

\b : 表示从先前的匹配中匹配到 word 和 non-word 的边界。
比如用:.*\b 来提取一句话，这句话中不包含任何的非字母。

上次更新: 12/27/2023, 8:55:47 AM

← Useful Linux Tool Basic-Shell-Script→