字符串方法
正则表达式
模式匹配和提取
搜索和替换
编译正则表达式
正则表达式进一步阅读
字符串方法
转换字符
str.maketrans() 获取转换表
translate() 基于转换表执行字符串映射
maketrans() 第一个参数是被取代的字符, 第二个参数是取代的字符, 第三个是被映射为 None 的字符
字符转换例子
- >>> greeting = '===== Have a great day ====='
- >>> greeting.translate(str.maketrans('=', '-'))
- '----- Have a great day -----'
- >>> greeting = '===== Have a great day!! ====='
- >>> greeting.translate(str.maketrans('=', '-', '!'))
- '----- Have a great day -----'
- >>> import string
- >>> quote = 'SIMPLICITY IS THE ULTIMATE SOPHISTICATION'
- >>> tr_table = str.maketrans(string.ascii_uppercase, string.ascii_lowercase)
- >>> quote.translate(tr_table)
- 'simplicity is the ultimate sophistication'
- >>> sentence = "Thi1s is34 a senten6ce"
- >>> sentence.translate(str.maketrans('','', string.digits))
- 'This is a sentence'
- >>> greeting.translate(str.maketrans('','', string.punctuation))
- 'Have a great day'
移除首 / 尾 / 两者的字符串
仅移除首 / 尾连续的字符
默认空格会被除去
如果指定了多个字符, 它会被视为集合, 并使用其中所有的组合
- >>> greeting = 'Have a nice day :)'
- >>> greeting.strip()
- 'Have a nice day :)'
- >>> greeting.rstrip()
- 'Have a nice day :)'
- >>> greeting.lstrip()
- 'Have a nice day :)'
- >>> greeting.strip(') :')
- 'Have a nice day'
- >>> greeting = '===== Have a great day!! ====='
- >>> greeting.strip('=')
- 'Have a great day!!'
风格化
width 参数指定了总的输出字符串长度
- >>> 'Hello World'.center(40, '*')
- '************* Hello World **************'
改变大小写和大小写检查
- >>> sentence = 'thIs iS a saMple StrIng'
- >>> sentence.capitalize()
- 'This is a sample string'
- >>> sentence.title()
- 'This Is A Sample String'
- >>> sentence.lower()
- 'this is a sample string'
- >>> sentence.upper()
- 'THIS IS A SAMPLE STRING'
- >>> sentence.swapcase()
- 'THiS Is A SAmPLE sTRiNG'
- >>> 'good'.islower()
- True
- >>> 'good'.isupper()
- False
检查是否字符串由数值构成
- >>> '1'.isnumeric()
- True
- >>> 'abc1'.isnumeric()
- False
- >>> '1.2'.isnumeric()
- False
检查是否字符串序列是否存在
- >>> sentence = 'This is a sample string'
- >>> 'is' in sentence
- True
- >>> 'this' in sentence
- False
- >>> 'This' in sentence
- True
- >>> 'this' in sentence.lower()
- True
- >>> 'is a' in sentence
- True
- >>> 'test' not in sentence
- True
获取字符序列存在的次数 (非覆盖)
- >>> sentence = 'This is a sample string'
- >>> sentence.count('is')
- 2
- >>> sentence.count('w')
- 0
- >>> word = 'phototonic'
- >>> word.count('oto')
- 1
匹配头尾字符序列
- >>> sentence
- 'This is a sample string'
- >>> sentence.startswith('This')
- True
- >>> sentence.startswith('The')
- False
- >>> sentence.endswith('ing')
- True
- >>> sentence.endswith('ly')
- False
基于字符序列分割字符串
返回列表
要使用正则表达式分割, 使用 re.split()
- >>> sentence = 'This is a sample string'
- >>> sentence.split()
- ['This', 'is', 'a', 'sample', 'string']
- >>> "oranges:5".split(':')
- ['oranges', '5']
- >>> "oranges :: 5".split('::')
- ['oranges', '5']
- >>> "a e i o u".split(' ', maxsplit=1)
- ['a', 'e i o u']
- >>> "a e i o u".split(' ', maxsplit=2)
- ['a', 'e', 'i o u']
- >>> line = '{1.0 2.0 3.0}'
- >>> nums = [float(s) for s in line.strip('{}').split()]
- >>> nums
- [1.0, 2.0, 3.0]
连接字符串列表
- >>> str_list
- ['This', 'is', 'a', 'sample', 'string']
- >>> ''.join(str_list)'This is a sample string'>>>'-'.join(str_list)'This-is-a-sample-string'>>> c =' :: '>>> c.join(str_list)'This :: is :: a :: sample :: string'
替换字符
第三个参数指定使用多少次的替换
变量必须显式地重赋值
- >>> phrase = '2 be or not 2 be'
- >>> phrase.replace('2', 'to')
- 'to be or not to be'
- >>> phrase
- '2 be or not 2 be'
- >>> phrase.replace('2', 'to', 1)
- 'to be or not 2 be'
- >>> phrase = phrase.replace('2', 'to')
- >>> phrase
- 'to be or not to be'
进一步阅读
Python 文档 - 字符串方法
python 字符串方法教程
正则表达式
正则表达式元素便利参考
元字符 | 描述 |
---|---|
^ | 锚定,匹配字符串行首 |
$ | 锚定,匹配字符串行尾 |
. | 匹配除换行符 \ n 之外的字符 |
| | 或操作符,用于匹配多个模式 |
() | 用于模式分组和提取 |
[] | 字符类 - 匹配多个字符中的一个 |
\^ | 使用 \ 匹配元字符 |
量词 | 描述 |
---|---|
* | 匹配之前的字符 0 或多次 |
+ | 匹配之前的字符 1 或多次 |
? | 匹配之前的字符 0 或 1 次 |
{n} | 匹配 n 次 |
{n,} | 匹配至少 n 次 |
{n,m} | 匹配至少 n 次,至多 m 次 |
字符类 | 描述 |
---|---|
[aeiou] | 匹配任何元音 |
[^aeiou] | ^ 倒置选择,所以这会匹配任何的辅音 |
[a-f] | 匹配 abcdef 中任意字符 |
\d | 匹配数字,跟 [0-9] 一样 |
\D | 匹配非数字,跟 [^0-9] 或 [^\d]一样 |
\w | 匹配字母和下划线,跟 [a-zA-Z_] 一样 |
\W | 匹配非字母和非下划线字符,跟 [^a-zA-Z_] 或 [^\w] 一样 |
\s | 匹配空格符,跟 [\ \t\n\r\f\v] 一样 |
\S | 匹配非空行符,跟 [^\s] 一样 |
\b | 单词边界,单词定义为字母序列 |
\B | 非单词边界 |
编译标记 | 描述 |
---|---|
re.I | 忽略大小写 |
re.M | 多行模式,^ 和 $ 锚定符号可以处理中间行 |
re.S | 单行模式,. 也会匹配 \ n |
re.V | 冗余模式,提高可读性和添加注释 |
Python 文档 - 标记 - 详情和标记长名
变量 | 描述 |
---|---|
\1, \2, \3 等等 | 引用匹配的模式 |
\g<1>, \g<2>, \g<3> etc | 引用匹配的模式,用于区分数字和引用 |
模式匹配和提取
匹配 / 提取字符序列
使用 re.search() 查看是否一个字符串包含某个模式
使用 re.findall() 获得一个匹配模式列表
使用 re.split() 获得一个基于模式分割字符串的列表
它们的语法如下
- re.search(pattern, string, flags=0)
- re.findall(pattern, string, flags=0)
- re.split(pattern, string, maxsplit=0, flags=0)
- >>> import re
- >>> string = "This is a sample string"
- >>> bool(re.search('is', string))
- True
- >>> bool(re.search('this', string))
- False
- >>> bool(re.search('this', string, re.I))
- True
- >>> bool(re.search('T', string))
- True
- >>> bool(re.search('is a', string))
- True
- >>> re.findall('i', string)
- ['i', 'i', 'i']
使用正则表达式
当使用正则表达式元素时用 r''格式
- >>> string
- 'This is a sample string'
- >>> re.findall('is', string)
- ['is', 'is']
- >>> re.findall('\bis', string)
- []
- >>> re.findall(r'\bis', string)
- ['is']
- >>> re.findall(r'\w+', string)
- ['This', 'is', 'a', 'sample', 'string']
- >>> re.split(r'\s+', string)
- ['This', 'is', 'a', 'sample', 'string']
- >>> re.split(r'\d+', 'Sample123string54with908numbers')
- ['Sample', 'string', 'with', 'numbers']
- >>> re.split(r'(\d+)', 'Sample123string54with908numbers')
- ['Sample', '123', 'string', '54', 'with', '908', 'numbers']
引用
- >>> quote = "So many books, so little time"
- >>> re.search(r'([a-z]{2,}).*\1', quote, re.I)
- <_sre.SRE_Match object; span=(0, 17), match='So many books, so'>
- >>> re.search(r'([a-z])\1', quote, re.I)
- <_sre.SRE_Match object; span=(9, 11), match='oo'>
- >>> re.findall(r'([a-z])\1', quote, re.I)
- ['o', 't']
搜索和替换
语法
re.sub(pattern, repl, string, count=0, flags=0)
简单替换
re.sub 不会改变传入变量的值, 必须显式地指定
- >>> sentence = 'This is a sample string'
- >>> re.sub('sample', 'test', sentence)
- 'This is a test string'
- >>> sentence
- 'This is a sample string'
- >>> sentence = re.sub('sample', 'test', sentence)
- >>> sentence
- 'This is a test string'
- >>> re.sub('/', '-', '25/06/2016')
- '25-06-2016'
- >>> re.sub('/', '-', '25/06/2016', count=1)
- '25-06/2016'
- >>> greeting = '***** Have a great day *****'
- >>> re.sub('\*', '=', greeting)
- '===== Have a great day ====='
引用
- >>> words = 'night and day'
- >>> re.sub(r'(\w+)( \w+ )(\w+)', r'\3\2\1', words)
- 'day and night'
- >>> line = 'Can you spot the the mistakes? I i seem to not'
- >>> re.sub(r'\b(\w+) \1\b', r'\1', line, flags=re.I)
- 'Can you spot the mistakes? I seem to not'
在 re.sub() 替换部分使用函数
- >>> import math
- >>> numbers = '1 2 3 4 5'
- >>> def fact_num(n):
- ... return str(math.factorial(int(n.group(1))))
- ...
- >>> re.sub(r'(\d+)', fact_num, numbers)
- '1 2 6 24 120'
- >>> re.sub(r'(\d+)', lambda m: str(math.factorial(int(m.group(1)))), numbers)
- '1 2 6 24 120'
从 re.sub 调用函数
用函数输出替换字符串模式
lambda 教程
编译正则表达式
- >>> swap_words = re.compile(r'(\w+)( \w+ )(\w+)')
- >>> swap_words
- re.compile('(\\w+)( \\w+ )(\\w+)')
- >>> words = 'night and day'
- >>> swap_words.search(words).group()
- 'night and day'
- >>> swap_words.search(words).group(1)
- 'night'
- >>> swap_words.search(words).group(2)
- 'and'
- >>> swap_words.search(words).group(3)
- 'day'
- >>> swap_words.search(words).group(4)
- Traceback (most recent call last):
- File "<stdin>", line 1, in <module>
- IndexError: no such group
- >>> bool(swap_words.search(words))
- True
- >>> swap_words.findall(words)
- [('night', 'and', 'day')]
- >>> swap_words.sub(r'\3\2\1', words)
- 'day and night'
- >>> swap_words.sub(r'\3\2\1', 'yin and yang')
- 'yang and yin'
正则表达式进一步阅读
Python 文档 - re 模块
Python 文档 - 正则表达式使用介绍
developers.google - 正则表达式教程
automatetheboringstuff - 正则表达式
综合参考: regex 是什么?
练习工具
online regex tester 展示解释, 提供参考指南和保存分享 regex
regexone - 交互式教程
cheatsheet - 交互式学习
regexcrossword - 通过解答纵横游戏练习, 开始之前阅读'How to play'部分
来源: http://www.jianshu.com/p/c1f7abc7371f