2 什么是正则表达式?

Date: 2019-07-03
Author: Sun

本节目的:

(1)掌握正则表达式和 re 模块使用

(2)python 操作正则表达式, 匹配贪婪和非贪婪模式使用

(3)掌握常见函数 find, findall, search, match, split 等用法

正则表达式

? 正则表达式 (Regular Expression) 是一种文本模式, 包括普通字符 (例如, a 到 z 之间的字母) 和特殊字符(称为 "元字符").

? 正则表达式使用单个字符串来描述, 匹配一系列匹配某个句法规则的字符串.

1 为什么使用正则表达式?

? 列举几个比较鲜明的例子帮助你理解.

? (1)判断一个字符串里是否包含数字, 如果有, 返回 true; 否则返回 false;

? (2)给定字符串 str, 检查其是否包含连续重复的字母(a-zA-Z), 包含返回 true, 否则返回 false

? (3) 从一个大文本里面, 提取出我们想要数据.

再者比如在工作中我们经常遇到这样的需求:

1. 给你一个字符串, 把字符串里面的链接, 数字, 电话等显示不同的颜色;

2. 给你一个包含自定义表情的文字, 找出里面的表情, 替换成本地的表情图片;

3. 根据用户的输入内容, 判断是否是微信号, 手机号, 邮箱, 纯数字等;

提示:

对于 1 和 2 的情景, 我们使用正则表达式 + 富文本便可以轻松应对.

对于 3, 我们只需根据正则表达式的规则, 封装好自己的正则库, 就可以做到一劳永逸了!

常用的正则匹配工具

? 在线匹配工具:

1 http://www.regexpal.com/
? 2 http://rubular.com/

? 正则匹配软件

? McTracer http://pan.baidu.com/s/19Yn49 (https://pan.baidu.com/s/19Yn49)

(1)^ $ * ? + {
	2	
} {
	2,	
} {
	2, 5	
} |
(2)[], [^], [a-z], [0-9], [4|5]
(3) \s, \S, \w, \W
(4) [\u4E00-\u9FA5] () \d

import re
a = 'one1two2three3four4'
ret = re.findall(r'(\d+)', a)
print(ret)
['1', '2', '3', '4']

import re
p = re.compile(r"(\d+)")
a = 'one1two2three3four4'
res = p.findall(a)
print(res)
['1', '2', '3', '4']
a = 'hello alex alex adn acd'
n = re.findall('(a)(\w+)',a)
print(n)                            #从左到右, 从外到内
#[('a', 'lex'), ('a', 'lex'), ('a', 'dn'), ('a', 'cd')]

# -*- coding: utf-8 -*-
__author__ = 'sun'
__date__ = '2019/7/03 上午 9:48'
import re
line = "liu dehua was older than you"
matchObj = re.match(r'^liu (.*) was (.*?) .*', line, re.M | re.I)
if matchObj:
   print("matchObj.group() :", matchObj.group())
   print("matchObj.group(1) :", matchObj.group(1))
   print("matchObj.group(2) :", matchObj.group(2))
else:
   print("No match!!")

matchObj.group() : liu dehua was older than you
matchObj.group(1) : dehua
matchObj.group(2) : older

# -*- coding: utf-8 -*-
__author__ = 'sun'
__date__ = '2019/7/03 上午 9:48'
import re
line = "liu dehua was older than you"
matchObj = re.search(r'^liu (.*) was (.*?) .*', line, re.M | re.I)
if matchObj:
   print("matchObj.group() :", matchObj.group())
   print("matchObj.group(1) :", matchObj.group(1))
   print("matchObj.group(2) :", matchObj.group(2))
else:
   print("No match!!")

matchObj.group() : liu dehua was older than you
matchObj.group(1) : dehua
matchObj.group(2) : older

import re
ret_match = re.match("c", "abcde")  # 从字符串开头匹配, 匹配到返回 match 的对象, 匹配不到返回 None
if (ret_match):
   print("ret_match:" + ret_match.group())
else:
   print("ret_match:None")
ret_search = re.search("c", "abcde")  # 扫描整个字符串返回第一个匹配到的元素并结束, 匹配不到返回 None
if (ret_search):
   print("ret_search:" + ret_search.group())

ret_match:None
ret_search:c

import re
a = "123abc456"
re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(0)  # 123abc456, 返回整体默认返回 group(0)
re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(1)  # 123
re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(2)  # abc
re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(3)  # 456

import re
# sub
ret_sub = re.sub(r'(one|two|three)', 'ok', 'one word two words three words')
print(ret_sub)
# subn
import re
ret_subn = re.subn(r'(one|two|three)', 'ok',
               'one word two words three words')
print(ret_subn)
# ok Word ok words ok words
# ('ok word ok words ok words', 3) 3, 表示替换的次数

import re
ret = re.split('\d+',
            'one1two2three3four4')
print(ret)
####output####
# 匹配到 1 的时候结果为'one'和'two2three3four4', 匹配到 2 的时候结果为'one',
#  'two'和'three3four4', 所以结果为:
#['one', 'two', 'three', 'four', '']

>>> s="This is a number 234-235-22-423"
>>> r=re.match(".+(\d+-\d+-\d+-\d+)",s)  #贪婪
>>> r.group(1)
'4-235-22-423'
>>> r=re.match(".+?(\d+-\d+-\d+-\d+)",s) #非贪婪
>>> r.group(1)
'234-235-22-423'

# 贪婪
>>> re.match(r"aa(\d+)","aa2343ddd").group(1)
'2343'
# 非贪婪
>>> re.match(r"aa(\d+?)","aa2343ddd").group(1)
'2'
>>> re.match(r"aa(\d+)ddd","aa2343ddd").group(1)
'2343'
>>> re.match(r"aa(\d+?)ddd","aa2343ddd").group(1)
'2343'
>>>
# 贪婪
ret_greed= re.findall(r'a(\d+)','a23b')
print(ret_greed)
['23']
# 非贪婪
ret_no_greed= re.findall(r'a(\d+?)','a23b')
print(ret_no_greed)
['2']

str = "i love 2,45 china v5 , 6666, yes"
res = re.findall(r".*?(.*)yes$", str)
print(res)

p = re.compile(r"^(6\d{5}[1,2,4])")
print(p.match("6256432"))

^(13[0-9]|14[5|7]|15[0|1|2|3|5|6|7|8|9]|18[0|1|2|3|5|6|7|8|9])\d{
	8	
}$
(^(13\d|14[57]|15[^4\D]|17[13678]|18\d)\d{
	8	
}|170[^346\D]\d{
	7	
})$

if (!s.match(/^[a-zA-Z]+:\\/\\//))
{
    s = 'http://' + s;
}

# -*- coding: utf-8 -*-
__author__ = 'sun'
__date__ = '2019/7/03 下午 3:24'
import re
def check_card_isvalid(card_str):
   p = re.compile(r"^([1-9]\d{5}[12]\d{3}(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])\d{3}[0-9xX])$")
   return p.match(card_str)
card_str = "422101198808100412"
res = check_card_isvalid(card_str)
print(res)

'''
 正则表达式匹配
'''regstr ="[DEBUG][2018-09-10 09:10:34][192.169.11.34][function1]""[this is our log file, has error]"
p = re.compile(r"\[(?P<log_level>.*)\]\[(?P<time_local>.*)\]"
            r"\[(?P<ip_address>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\]")
res = p.findall(regstr)
print(res)
print(dir(res))

来源: http://www.bubuko.com/infodetail-3112548.html

与本文相关文章

暂无,快来抢沙发吧！