正则表达式概述
正则表达式是一种定义的规则, Linux 工具可以用它来过滤文本.
基础正则表达式
纯文本
- [root@node1 ~]# echo "this is a cat" | sed -n '/cat/p'
- this is a cat
- [root@node1 ~]# echo "this is a cat" | gawk '/cat/{print $0}'
- this is a cat
正则表达式的匹配非常挑剔, 尤其需要记住, 正则表达式区分大小写.
特殊字符
正则表达式识别的特殊字符包括:
.*[]^${}\+?|()
如果要使用某个特殊字符作为文本字符, 就必须转义, 一般用 (\) 来转义.
- [root@node1 ~]# echo "this is a $" | sed -n '/\$/p'
- this is a $
锚字符
有两个特殊字符可以用来将模式锁定在数据流的行首或行尾
脱字符 (^) 定义从数据流中文本行的行首开始的模式.
美元符 ($) 定义了行尾锚点.
- [root@node1 ~]# echo "this is a cat" | sed -n '/^this/p'
- this is a cat
- [root@node1 ~]# echo "this is a cat" | sed -n '/cat$/p'
- this is a cat
在一些情况下可以组合使用这两个命令
1. 比如查找只含有特定文本的行
- [root@node1 ljy]# more test.txt
- this is a dog
- what
- how
- this is a cat
- is a dog
- [root@node1 ljy]# sed -n '/^is a dog$/p' test.txt
- is a dog
- [root@node
2. 两个锚点组合起来, 可以直接过滤空白行
- [root@node1 ljy]# more test.txt
- this is a dog
- what
- how
- this is a cat
- is a dog
- [root@node1 ljy]# sed '/^$/d' test.txt
- this is a dog
- what
- how
- this is a cat
- is a dog
点号字符
点号用来匹配除换行符外的任意单个字符, 他必须匹配一个字符.
- [root@node1 ljy]# more test.txt
- this is a dog
- what
- how
- this is a cat
- is a dog
- at
- [root@node1 ljy]# sed -n '/.at/p' test.txt
- what
- this is a cat
字符组
限定待匹配的具体字符, 使用字符组. 使用方括号来定义一个字符组.
- [root@node1 ljy]# more test.txt
- this is a dog
- this is a Dog
- this is a DoG
- this is a cat
- [root@node1 ljy]# sed -n '/[dD]og/p' test.txt
- this is a dog
- this is a Dog
- [root@node1 ljy]# sed -n '/[dD]o[gG]/p' test.txt
- this is a dog
- this is a Dog
- this is a DoG
排除型字符组
要排除某些特定的元素, 要在字符组前面加个脱字符.
- [root@node1 ljy]# sed -n '/[dD]o[gG]/p' test.txt
- this is a dog
- this is a Dog
- this is a DoG
- [root@node1 ljy]# sed -n '/[^D]og/p' test.txt
- this is a dog
区间
正则表达式会包括此区间内的任意字符.
- [root@node1 ljy]# more test.txt
- 123123
- 1231
- 121222222
- 412345341613
- vsdvs
- qwer12344123
- 12345
- 34211
- 444444
- [root@node1 ljy]# sed -n '/^[0-9][0-9][0-9][0-9][0-9]$/p' test.txt
- 12345
- 34211
拓展正则表达式
问号
问号表明前面的字符出现 0 次或者 1 次, 仅限于此.
- [root@node1 ljy]# echo "bat" | gawk '/ba?t/{print $0}'
- bat
- [root@node1 ljy]# echo "baat" | gawk '/ba?t/{print $0}'
- [root@node1 ljy]# echo "bt" | gawk '/ba?t/{print $0}'
- bt
可以将问号和字符组一起使用
- [root@node1 ljy]# echo "bt" | gawk '/b[ae]?t/{print $0}'
- bt
- [root@node1 ljy]# echo "bat" | gawk '/b[ae]?t/{print $0}'
- bat
- [root@node1 ljy]# echo "bet" | gawk '/b[ae]?t/{print $0}'
- bet
- [root@node1 ljy]# echo "baat" | gawk '/b[ae]?t/{print $0}'
加号
加号表明前面的字符可以出现一次或多次, 但至少是 1 次.
- [root@node1 ljy]# echo "baat" | gawk '/b[ae]+t/{print $0}'
- baat
- [root@node1 ljy]# echo "bt" | gawk '/b[ae]+t/{print $0}'
- [root@node1 ljy]# echo "bt" | gawk '/ba+t/{print $0}'
- [root@node1 ljy]# echo "bat" | gawk '/ba+t/{print $0}'
- bat
- [root@node1 ljy]# echo "baat" | gawk '/ba+t/{print $0}'
- baat
花括号
ERE 中的花括号允许你为可重复的正则表达式规定上下限.
m,n 最少出现 m 此, 最多出现 n 次.
- [root@node1 ljy]# echo "baat" | gawk '/b[ae]{1,2}t/{print $0}'
- baat
- [root@node1 ljy]# echo "baaat" | gawk '/b[ae]{1,2}t/{print $0}'
管道符号
用逻辑 or 的方式指定正则表达式规则, 其中一个条件符合要就即可.
表达式分组
正则表达式分组也可以用圆括号进行分组.
- [root@node1 ljy]# echo "bat" | gawk '/b(a|e)t/{print $0}'
- bat
- [root@node1 ljy]# echo "baat" | gawk '/b(a|e)t/{print $0}'
- [root@node1 ljy]# echo "bet" | gawk '/b(a|e)t/{print $0}'
- bet
来源: https://www.cnblogs.com/jinyuanliu/p/10937795.html