Python3 迭代器与生成器
迭代器
迭代是 Python 最强大的功能之一, 是访问集合元素的一种方式.
迭代器是一个可以记住遍历的位置的对象.
迭代器对象从集合的第一个元素开始访问, 直到所有的元素被访问完结束. 迭代器只能往前不会后退.
迭代器有两个基本的方法: iter() 和 next().
字符串, 列表, 元组, 集合, 字典, range(), 文件句柄等可迭代对象 (iterable) 都可用于创建迭代器:
内部含有__iter__()方法的就是可迭代对象, 遵循可迭代协议.
可迭代对象.__iter__() 或者 iter(可迭代对象)化成迭代器
- >>> list = [1,2,3,4]
- >>> it = iter(list) # 创建迭代器对象
- >>> next(it) # 输出迭代器的下一个元素
- 1
- >>> next(it)
- 2
- >>>
迭代器对象可以使用常规 for 语句进行遍历:
- >>> list = ['a', 'b', 'c', 'd']
- >>> it = iter(list) # 创建迭代器对象
- >>> for x in it:
- print(x, end=" ")
- a b c d
- >>>
也可以使用 next() 函数:
- >>> lst = [2,6,8,9]
- >>> it = iter(lst) # 创建迭代器对象
- >>>
- >>> while True:
- try:
- print(next(it))
- except StopIteration:
- break
- 2
- 6
- 8
- 9
- >>>
创建一个迭代器
把一个类作为一个迭代器使用需要在类中实现两个方法__iter__()与 __next__() .
如果你已经了解的面向对象编程, 就知道类都有一个构造函数, Python 的构造函数为__init__(), 它会在对象初始化的时候执行.
__iter__() 方法返回一个特殊的迭代器对象, 这个迭代器对象实现了 __next__() 方法并通过 StopIteration 异常标识迭代的完成.
__next__() 方法 (Python 2 里是 next()) 会返回下一个迭代器对象.
创建一个返回数字的迭代器(计数器), 初始值为 1, 逐步递增 1:
- class Counter:
- def __iter__(self):
- self.a = 1
- return self
- def __next__(self):
- x = self.a
- self.a += 1
- return x
- myclass = Counter()
- myiter = iter(myclass)
- print(next(myiter))
- print(next(myiter))
- print(next(myiter))
- print(next(myiter))
- print(next(myiter))
- # 执行输出结果为:
- 1
- 2
- 3
- 4
- 5
- StopIteration
StopIteration 异常用于标识迭代的完成, 防止出现无限循环的情况, 在__next__()方法中我们可以设置在完成指定循环次数后触发 StopIteration 异常来结束迭代.
- >>> str1 = "Python"
- >>> strObj = str1.__iter__()
- >>> strObj.__next__()
- 'P'
- >>> strObj.__next__()
- 'y'
- >>> strObj.__next__()
- 't'
- >>> strObj.__next__()
- 'h'
- >>> strObj.__next__()
- 'o'
- >>> strObj.__next__()
- 'n'
- >>> strObj.__next__()
- Traceback (most recent call last):
- File "<pyshell#33>", line 1, in <module>
- strObj.__next__()
- StopIteration
- >>>
那么如何判断一个对象是否是可迭代对象?
内部是否含有__iter__方法:
借助 collections 中 Iterable,Iterator 判断类型
- >>> tup = (1,2,3)
- >>> type(tup)
- <class 'tuple'>
- >>> dir(tup) # 带参数时, 返回参数的属性, 方法列表.
- ['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
- '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__',
- '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count', 'index']
- >>> print('__iter__' in dir(tup))
- True
- >>>
- >>> dic = {1:'dict', 2:'str', 3:'list', 4:'tuple', 5:'set', 6:'range()',7:'flie handler'}
- >>> isinstance(dic, Iterable)
- True
- >>> isinstance(dic, Iterator)
- False
- >>>
- >>> ran = range(6)
- >>> type(ran)
- <class 'range'>
- >>> isinstance(ran, Iterable)
- True
- >>> isinstance(ran, Iterator)
- False
- >>>
生成器
在 Python 中, 使用了 yield 的函数被称为生成器(generator).
跟普通函数不同的是, 生成器是一个返回迭代器的函数, 只能用于迭代操作, 更简单点理解生成器就是一个迭代器.
在调用生成器运行的过程中, 每次遇到 yield 时函数会暂停并保存当前所有的运行信息, 返回 yield 的值, 并在下一次执行 next()方法时从当前位置继续运行.
调用一个生成器函数, 返回的是一个迭代器对象.
yieldVs return:
return 返回后, 函数状态终止, 而 yield 会保存当前函数的执行状态, 在返回后, 函数又回到之前保存的状态继续执行.
return 终止函数, yield 不会终止生成器函数.
都会返回一个值, return 给函数的执行者返回值, yield 是给 next()返回值
以下实例使用 yield 实现斐波那契数列:
- >>> def fib(max): # 生成器函数 - 斐波那契
- a, b, n = 0, 1, 0
- while n <max:
- yield b # 使用 yield
- a, b = b, a + b
- n = n + 1
- >>> f = fib(6) # 调用 fab(5) 不会执行 fab 函数, 而是返回一个 iterable 对象!
- >>> f # Python 解释器会将其视为一个 generator
- <generator object fib at 0x000001C6CB627780>
- >>>
- >>> for n in fib(5):
- print(n)
- 1
- 1
- 2
- 3
- 5
- >>>
- >>> f = fib(5)
- >>> next(f) # 使用 next 函数从生成器中取值, 使用 next 可以推动生成器的执行
- 1
- >>> next(f)
- 1
- >>> next(f)
- 2
- >>> next(f)
- 3
- >>> next(f)
- 5
- >>> next(f) # 当函数中已经没有更多的 yield 时继续执行 next(g), 遇到 StopIteration
- Traceback (most recent call last):
- File "<pyshell#37>", line 1, in <module>
- next(f)
- StopIteration
- >>>
- >>> fwrong = fib(6)
- >>> fwrong.next() # Python2 中的语法, Python3 会报错
- Traceback (most recent call last):
- File "<pyshell#40>", line 1, in <module>
- fwrong.next() # Python2 中的语法, Python3 会报错
- AttributeError: 'generator' object has no attribute 'next'
- >>>
send 向生成器中发送数据. send 的作用相当于 next, 只是在驱动生成器继续执行的同时还可以向生成器中传递数据.
- >>> import numbers
- >>> def gen_sum():
- total = 0
- while True:
- num = yield
- if isinstance(num, numbers.Integral):
- total += num
- print('total:', total)
- elif num is None:
- break
- return total
- >>> g = gen_sum()
- >>> g
- <generator object gen_sum at 0x0000026A6703D3B8>
- >>> g.send(None) # 相当于 next(g), 预激活生成器
- >>> g.send(2)
- total: 2
- >>> g.send(6)
- total: 8
- >>> g.send(12)
- total: 20
- >>> g.send(None) # 停止生成器
- Traceback (most recent call last):
- File "<pyshell#40>", line 1, in <module>
- g.send(None)
- StopIteration: 20
- >>>
- >>> try:
- g.send(None) # 停止生成器
- except StopIteration as e:
- print(e.value)
- None
- >>>
yield from 关键字
yield from 将一个可迭代对象变成一个迭代器返回, 也可以说, yield from 关键字可以直接返回一个生成器
- >>> def func():
- lst = ['str', 'tuple', 'list', 'dict', 'set']
- yield lst
- >>> gen = func()
- >>> next(gen)
- ['str', 'tuple', 'list', 'dict', 'set']
- >>> for i in gen:
- print(i)
- >>> # yield from 将一个可迭代对象变成一个迭代器返回
- >>> def func2():
- lst = ['str', 'tuple', 'list', 'dict', 'set']
- yield from lst
- >>> gen2 = func2()
- >>> next(gen2)
- 'str'
- >>> next(gen2)
- 'tuple'
- >>> for i in gen2:
- print(i)
- list
- dict
- set
- >>>
- >>> lst = ['H','e','l']
- >>> dic = {'l':'vvvvv','o':'eeeee'}
- >>> str1 = 'Python'
- >>>
- >>> def yield_gen():
- for i in lst:
- yield i
- for j in dic:
- yield j
- for k in str1:
- yield k
- >>> for item in yield_gen():
- print(item, end='')
- HelloPython
- >>>
- >>> l = ['H','e','l']
- >>> d = {'l':'xxxxx','o':'ooooo'}
- >>> s = 'Java'
- >>>
- >>> def yield_from_gen():
- yield from l
- yield from d
- yield from s
- >>> for item in yield_from_gen():
- print(item, end='')
- HelloJava
- >>>
为什么使用生成器
更容易使用, 代码量较小内存使用更加高效. 比如:
列表是在建立的时候就分配所有的内存空间,
而生成器仅仅是需要的时候才使用, 更像一个记录代表了一个无限的流. 有点像数据库操作单条记录使用的游标.
如果我们要读取并使用的内容远远超过内存, 但是需要对所有的流中的内容进行处理, 那么生成器是一个很好的选择,
比如可以让生成器返回当前的处理状态, 由于它可以保存状态, 那么下一次直接处理即可.
协程
根据维基百科 https://www.wikipedia.org/ 给出的定义,"协程 https://en.wikipedia.org/wiki/Coroutine 是为非抢占式多任务产生子程序的计算机程序组件, 协程允许不同入口点在不同位置暂停或开始执行程序". 从技术的角度来说,"协程就是你可以暂停执行的函数". 如果你把它理解成 "就像生成器一样", 那么你就想对了.
使用 yield 实现协程
- # 基于 yield 实现异步
- def consumer():
- '''任务 1: 接收数据, 处理数据'''
- while True:
- x=yield
- def producer():
- '''任务 2: 生产数据'''
- g=consumer()
- next(g)
- for i in range(10000000):
- g.send(i)
- producer()
使用 yield from 实现的协程
- import datetime
- import heapq # 堆模块
- import time
- class Task:
- def __init__(self, wait_until, coro):
- self.coro = coro
- self.waiting_until = wait_until
- def __eq__(self, other):
- return self.waiting_until == other.waiting_until
- def __lt__(self, other):
- return self.waiting_until <other.waiting_until
- class SleepingLoop:
- def __init__(self, *coros):
- self._new = coros
- self._waiting = []
- def run_until_complete(self):
- for coro in self._new:
- wait_for = coro.send(None)
- heapq.heappush(self._waiting, Task(wait_for, coro))
- while self._waiting:
- now = datetime.datetime.now()
- task = heapq.heappop(self._waiting)
- if now < task.waiting_until:
- delta = task.waiting_until - now
- time.sleep(delta.total_seconds())
- now = datetime.datetime.now()
- try:
- print('*'*50)
- wait_until = task.coro.send(now)
- print('-'*50)
- heapq.heappush(self._waiting, Task(wait_until, task.coro))
- except StopIteration:
- pass
- def sleep(seconds):
- now = datetime.datetime.now()
- wait_until = now + datetime.timedelta(seconds=seconds)
- print('before yield wait_until')
- actual = yield wait_until # 返回一个 datetime 数据类型的时间
- print('after yield wait_until')
- return actual - now
- def countdown(label, length, *, delay=0):
- print(label, 'waiting', delay, 'seconds before starting countdown')
- delta = yield from sleep(delay)
- print(label, 'starting after waiting', delta)
- while length:
- print(label, 'T-minus', length)
- waited = yield from sleep(1)
- length -= 1
- print(label, 'lift-off!')
- def main():
- loop = SleepingLoop(countdown('A', 5), countdown('B', 3, delay=2),
- countdown('C', 4, delay=1))
- start = datetime.datetime.now()
- loop.run_until_complete()
- print('Total elapsed time is', datetime.datetime.now() - start)
- if __name__ == '__main__':
- main()
执行结果:
- A waiting 0 seconds before starting countdown
- before yield wait_until
- B waiting 2 seconds before starting countdown
- before yield wait_until
- C waiting 1 seconds before starting countdown
- before yield wait_until
- **************************************************
- after yield wait_until
- A starting after waiting 0:00:00
- A T-minus 5
- before yield wait_until
- --------------------------------------------------
- **************************************************
- after yield wait_until
- C starting after waiting 0:00:01.001511
- C T-minus 4
- before yield wait_until
- --------------------------------------------------
- **************************************************
- after yield wait_until
- A T-minus 4
- before yield wait_until
- --------------------------------------------------
- **************************************************
- after yield wait_until
- B starting after waiting 0:00:02.000894
- B T-minus 3
- before yield wait_until
- --------------------------------------------------
- **************************************************
- after yield wait_until
- C T-minus 3
- before yield wait_until
- --------------------------------------------------
- **************************************************
- after yield wait_until
- A T-minus 3
- before yield wait_until
- --------------------------------------------------
- **************************************************
- after yield wait_until
- B T-minus 2
- before yield wait_until
- --------------------------------------------------
- **************************************************
- after yield wait_until
- C T-minus 2
- before yield wait_until
- --------------------------------------------------
- **************************************************
- after yield wait_until
- A T-minus 2
- before yield wait_until
- --------------------------------------------------
- **************************************************
- after yield wait_until
- B T-minus 1
- before yield wait_until
- --------------------------------------------------
- **************************************************
- after yield wait_until
- C T-minus 1
- before yield wait_until
- --------------------------------------------------
- **************************************************
- after yield wait_until
- A T-minus 1
- before yield wait_until
- --------------------------------------------------
- **************************************************
- after yield wait_until
- B lift-off!
- **************************************************
- after yield wait_until
- C lift-off!
- **************************************************
- after yield wait_until
- A lift-off!
- Total elapsed time is 0:00:05.005168
asyncio 模块
asyncio 是 Python 3.4 版本引入的标准库, 直接内置了对异步 IO 的支持.
用 asyncio 提供的 @asyncio.coroutine 可以把一个 generator 标记为 coroutine 类型, 然后在 coroutine 内部用 yield from 调用另一个 coroutine 实现异步操作.
asyncio 的编程模型就是一个消息循环. 我们从 asyncio 模块中直接获取一个 EventLoop 的引用, 然后把需要执行的协程扔到 EventLoop 中执行, 就实现了异步 IO.
- coroutine+yield from
- import asyncio
- @asyncio.coroutine
- def hello():
- print("Nice to learn asyncio.coroutine!")
- # 异步调用 asyncio.sleep(1):
- r = yield from asyncio.sleep(1)
- print("Nice to learn asyncio.coroutine again !")
- # 获取 EventLoop:
- loop = asyncio.get_event_loop()
- # 执行 coroutine
- loop.run_until_complete(hello())
- loop.close()
- Nice to learn asyncio.coroutine !
- Nice to learn asyncio.coroutine again !
为了简化并更好地标识异步 IO, 从 Python 3.5 开始引入了新的语法 async 和 await, 可以让 coroutine 的代码更简洁易读.
请注意, async 和 await 是针对 coroutine 的新语法, 要使用新的语法, 只需要做两步简单的替换:
把 @asyncio.coroutine 替换为 async;
把 yield from 替换为 await.
async+await
在协程函数中, 可以通过 await 语法来挂起自身的协程, 并等待另一个协程完成直到返回结果:
- import asyncio
- async def hello():
- print("Nice to learn asyncio.coroutine!")
- # 异步调用 asyncio.sleep(1):
- await asyncio.sleep(1)
- print("Nice to learn asyncio.coroutine again !")
- # 获取 EventLoop:
- loop = asyncio.get_event_loop()
- # 执行 coroutine
- loop.run_until_complete(hello())
- loop.close()
执行多个任务
- import threading
- import asyncio
- async def hello():
- print('Hello Python! (%s)' % threading.currentThread())
- await asyncio.sleep(1)
- print('Hello Python again! (%s)' % threading.currentThread())
- loop = asyncio.get_event_loop()
- tasks = [hello(), hello()]
- loop.run_until_complete(asyncio.wait(tasks))
- loop.close()
结果:
- Hello Python! (<_MainThread(MainThread, started 4536)>)
- Hello Python! (<_MainThread(MainThread, started 4536)>)
- Hello Python again! (<_MainThread(MainThread, started 4536)>)
- Hello Python again! (<_MainThread(MainThread, started 4536)>)
获取返回值
- import threading
- import asyncio
- async def hello():
- print('Hello Python! (%s)' % threading.currentThread())
- await asyncio.sleep(1)
- print('Hello Python again! (%s)' % threading.currentThread())
- return "It's done"
- loop = asyncio.get_event_loop()
- task = loop.create_task(hello())
- loop.run_until_complete(task)
- ret = task.result()
- print(ret)
结果:
- Hello Python! (<_MainThread(MainThread, started 6136)>)
- Hello Python again! (<_MainThread(MainThread, started 6136)>)
- It's done
执行多个任务获取返回值
- import threading
- import asyncio
- async def hello(seq):
- print('Hello Python! (%s)' % threading.currentThread())
- await asyncio.sleep(1)
- print('Hello Python again! (%s)' % threading.currentThread())
- return "It's done", seq
- loop = asyncio.get_event_loop()
- task1 = loop.create_task(hello(2))
- task2 = loop.create_task(hello(1))
- task_list = [task1, task2]
- tasks = asyncio.wait(task_list)
- loop.run_until_complete(tasks)
- for t in task_list:
- print(t.result())
结果:
- Hello Python! (<_MainThread(MainThread, started 12956)>)
- Hello Python! (<_MainThread(MainThread, started 12956)>)
- Hello Python again! (<_MainThread(MainThread, started 12956)>)
- Hello Python again! (<_MainThread(MainThread, started 12956)>)
- ("It's done", 2)
- ("It's done", 1)
来源: http://www.bubuko.com/infodetail-3102071.html