Python中的生成器

转自：https://www.cnblogs.com/deeper/p/7565571.html

理解了迭代器以后，生成器就会简单很多，因为生成器其实是一种特殊的迭代器。不过这种迭代器更加优雅。它不需要再像上面的类一样写__iter__()和__next__()方法了，只需要一个yiled关键字。生成器一定是迭代器（反之不成立），因此任何生成器也是以一种懒加载的模式生成值。

语法上说，生成器函数是一个带yield关键字的函数。

调用生成器函数后会得到一个生成器对象，这个生成器对象实际上就是一个特殊的迭代器，拥有__iter__()和__next__()方法

我们先用一个例子说明一下：

>>> def generator_winter():
...   i = 1
...   while i <= 3:
...     yield i
...     i += 1
...
>>> generator_winter
<function generator_winter at 0x000000000323B9D8>
>>> generator_iter = generator_winter()
>>> generator_iter
<generator object generator_winter at 0x0000000002D9CAF0>
>>>
>>> generator_iter.__next__()
1
>>> generator_iter.__next__()
2
>>> generator_iter.__next__()
3
>>> generator_iter.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

现在解释一下上面的代码：

首先我们创建了一个含有yield关键字的函数generator_winter，这是一个生成器函数
然后，我们调用了这个生成器函数，并且将返回值赋值给了generator_iter，generator_iter是一个生成器对象；__注意generator_iter = generator_winter()时，函数体中的代码并不会执行，只有显示或隐示地调用next的时候才会真正执行里面的代码__。
生成器对象就是一个迭代器，所以我们可以调用对象的__next__方法来每次返回一个迭代器的值；迭代器的值通过yield返回；并且迭代完最后一个元素后，触发StopIteration异常；

既然生成器对象是一个迭代器，我们就可以使用for循环来迭代这个生成器对象：

>>> def generator_winter():
...   i = 1
...   while i <= 3:
...     yield i
...     i += 1
...
>>>
>>> for item in generator_winter():
...   print(item)
...
1
2
3
>>>

我们注意到迭代器不是使用return来返回值，而是采用yield返回值；那么这个yield有什么特别之处呢？

yield

我们知道，一个函数只能返回一次，即return以后，这次函数调用就结束了；

但是生成器函数可以暂停执行，并且通过yield返回一个中间值，当生成器对象的__next__()方法再次被调用的时候，生成器函数可以从上一次暂停的地方继续执行，直到触发一个StopIteration

上例中，当执行到yield i后，函数返回i值，然后print这个值，下一次循环，又调用__next__()方法，回到生成器函数，并从yield i的下一句继续执行；

摘一段<python核心编程>的内容：

生成器的另外一个方面甚至更加强力—-协同程序的概念。协同程序是可以运行的独立函数调用，可以暂停或者挂起，并从程序离开的地方继续或者重新开始。在有调用者和(被调用的)协同程序也有通信。举例来说，当协同程序暂停时，我们仍可以从其中获得一个中间的返回值，当调用回到程序中时，能够传入额外或者改变了的参数，但是仍然能够从我们上次离开的地方继续，并且所有状态完整。挂起返回出中间值并多次继续的协同程序被称为生成器，那就是python的生成真正在做的事情。这些提升让生成器更加接近一个完全的协同程序，因为允许值(和异常)能传回到一个继续的函数中，同样的，当等待一个生成器的时候，生成器现在能返回控制，在调用的生成器能挂起(返回一个结果)之前，调用生成器返回一个结果而不是阻塞的等待那个结果返回。

什么情况会触发StopIteration

两种情况会触发StopIteration

如果没有return，则默认执行到函数完毕时返回StopIteration；

如果在执行过程中 return，则直接抛出 StopIteration 终止迭代；如果在return后返回一个值，那么这个值为StopIteration异常的说明，不是程序的返回值。

>>> def generator_winter():
...   yield 'hello world'
...   return 'again'
...
>>>
>>> winter = generator_winter()
>>> winter.__next__()
'hello world'
>>> winter.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: again
>>>

生成器的作用

说了这么多，生成器有什么用呢？作为python主要特性之一，这是个极其牛逼的东西，由于它是惰性的，在处理大型数据时，可以节省大量内存空间；

当你需要迭代一个巨大的数据集合，比如创建一个有规律的100万个数字，如果采用列表来存储访问，那么会占用大量的内存空间；而且如果我们只是访问这个列表的前几个元素，那么后边大部分元素占据的内存空间就白白浪费了；这时，如果采用生成器，则不必创建完整的列表，一次循环返回一个希望得到的值，这样就可以大量节省内存空间；

这里在举例之前，我们先介绍一个生成器表达式（类似于列表推导式，只是把[]换成()），这样就创建了一个生成器。

>>> gen = (x for x in range(10))
>>> gen
<generator object <genexpr> at 0x0000000002A923B8>
>>>

生成器表达式的语法如下：(expr for iter_var in iterable if cond_expr)
用生成器来实现斐波那契数列

def fib(n):
    a, b = 0, 1
    while b <= n:
        yield b
        a, b = b, a+b

f = fib(10)
for item in f:
    print(item)

生成器方法

直接看生成器源代码

class __generator(object):
    '''A mock class representing the generator function type.'''
    def __init__(self):
        self.gi_code = None
        self.gi_frame = None
        self.gi_running = 0

    def __iter__(self):
        '''Defined to support iteration over container.'''
        pass

    def __next__(self):
        '''Return the next item from the container.'''
        pass

    def close(self):
        '''Raises new GeneratorExit exception inside the generator to terminate the iteration.'''
        pass

    def send(self, value):
        '''Resumes the generator and "sends" a value that becomes the result of the current yield-expression.'''
        pass

    def throw(self, type, value=None, traceback=None):
        '''Used to raise an exception inside the generator.'''
        pass

首先看到了生成器是自带__iter__和__next__魔术方法的；

send

生成器函数最大的特点是可以接受外部传入的一个变量，并根据变量内容计算结果后返回。这是生成器函数最难理解的地方，也是最重要的地方，协程的实现就全靠它了。

看一个小猫吃鱼的例子：

def cat():
    print('我是一只hello kitty')
    while True:
        food = yield
        if food == '鱼肉':
            yield '好开心'
        else:
            yield '不开心，人家要吃鱼肉啦'

中间有个赋值语句food = yield，可以通过send方法来传参数给food，试一下：

情况1）

1
2
3

miao = cat()    #只是用于返回一个生成器对象，cat函数不会执行
print(''.center(50,'-'))
print(miao.send('鱼肉'))

结果：

Traceback (most recent call last):
--------------------------------------------------
  File "C:/Users//Desktop/Python/cnblogs/subModule.py", line 67, in <module>
    print(miao.send('鱼肉'))
TypeError: can't send non-None value to a just-started generator

看到了两个信息：

miao = cat() ，只是用于返回一个生成器对象，cat函数不会执行
can’t send non-None value to a just-started generator；不能给一个刚创建的生成器对象直接send值

改一下

情况2）

1
2
3

miao = cat()
miao.__next__()
print(miao.send('鱼肉'))

结果：

1 2	我是一只hello kitty 好开心

没毛病，那么到底send()做了什么呢？send()的帮助文档写的很清楚，'''Resumes the generator and "sends" a value that becomes the result of the current yield-expression.'''；可以看到send依次做了两件事：

回到生成器挂起的位置，继续执行
并将send(arg)中的参数赋值给对应的变量，如果没有变量接收值，那么就只是回到生成器挂起的位置

但是，我认为send还做了第三件事：
3. 兼顾__next__()作用，挂起程序并返回值，所以我们在print(miao.send('鱼肉'))时，才会看到’好开心’；其实__next__()等价于send(None)

所以当我们尝试这样做的时候：

def cat():
    print('我是一只hello kitty')
    while True:
        food = yield
        if food == '鱼肉':
            yield '好开心'
        else:
            yield '不开心，人家要吃鱼肉啦'

miao = cat()
print(miao.__next__())
print(miao.send('鱼肉'))
print(miao.send('骨头'))
print(miao.send('鸡肉'))

就会得到这个结果：

我是一只hello kitty
None
好开心
None
不开心，人家要吃鱼肉啦

我们按步骤分析一下：

执行到print(miao.__next__())，执行cat()函数，print了”我是一只hello kitty”，然后在food = yield挂起，并返回了None，打印None
接着执行print(miao.send('鱼肉'))，回到food = yield，并将’鱼肉’赋值给food，生成器函数恢复执行；直到运行到yield '好开心'，程序挂起，返回’好开心’，并print ‘好开心’
接着执行print(miao.send('骨头'))，回到yield '好开心'，这时没有变量接收参数’骨头’，生成器函数恢复执行；直到food = yield，程序挂起，返回None，并print None
接着执行print(miao.send('鸡肉'))，回到food = yield，并将’鸡肉’赋值给food，生成器函数恢复执行；直到运行到yield '不开心，人家要吃鱼肉啦'，程序挂起，返回’不开心，人家要吃鱼肉啦’，，并print ‘不开心，人家要吃鱼肉啦’

大功告成；那我们优化一下代码：

def cat():
    msg = '我是一只hello kitty'
    while True:
        food = yield msg
        if food == '鱼肉':
            msg = '好开心'
        else:
            msg = '不开心，人家要吃鱼啦'

miao = cat()
print(miao.__next__())
print(miao.send('鱼肉'))
print(miao.send('鸡肉'))

我们再看一个更实用的例子，一个计数器

def counter(start_at = 0):
    count = start_at
    while True:
        val = (yield count)
        if val is not None:
            count = val
        else:
            count += 1

count = counter(5)
print(count.__next__())
print(count.__next__())
print(count.send(0))
print(count.__next__())
print(count.__next__())

结果：

close

帮助文档：'''Raises new GeneratorExit exception inside the generator to terminate the iteration.'''

手动关闭生成器函数，后面的调用会直接返回StopIteration异常

>>> def gene():
...   while True:
...     yield 'ok'
...
>>> g = gene()
>>> g.__next__()
'ok'
>>> g.__next__()
'ok'
>>> g.close()
>>> g.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

在close以后再执行__next__会触发StopIteration异常

throw

用来向生成器函数送入一个异常，throw()后直接抛出异常并结束程序，或者消耗掉一个yield，或者在没有下一个yield的时候直接进行到程序的结尾。

>>> def gene():
...   while True:
...     try:
...       yield 'normal value'
...     except ValueError:
...       yield 'we got ValueError here'
...     except TypeError:
...       break
...
>>> g = gene()
>>> print(g.__next__())
normal value
>>> print(g.__next__())
normal value
>>> print(g.throw(ValueError))
we got ValueError here
>>> print(g.__next__())
normal value
>>> print(g.throw(TypeError))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

通过yield实现单线程情况下的异步并发效果

def consumer(name):
    print('%s准备吃包子了' % name)
    while True:
        baozi_name = yield
        print('[%s]来了，被[%s]吃了'% (baozi_name, name))

def producer(*name):
    c1 = consumer(name[0])
    c2 = consumer(name[1])
    c1.__next__()
    c2.__next__()
    for times in range(5):
        print('做了两个包子')
        c1.send('豆沙包%s'%times)
        c2.send('菜包%s'%times)

producer('winter', 'elly')

效果：

winter准备吃包子了
elly准备吃包子了
做了两个包子
[豆沙包0]来了，被[winter]吃了
[菜包0]来了，被[elly]吃了
做了两个包子
[豆沙包1]来了，被[winter]吃了
[菜包1]来了，被[elly]吃了
做了两个包子
[豆沙包2]来了，被[winter]吃了
[菜包2]来了，被[elly]吃了
做了两个包子
[豆沙包3]来了，被[winter]吃了
[菜包3]来了，被[elly]吃了
做了两个包子
[豆沙包4]来了，被[winter]吃了
[菜包4]来了，被[elly]吃了

创建了两个独立的生成器，很有趣，很吊；

补充几个小例子：

使用生成器创建一个range

def range(n):
    count = 0
    while count < n:
        yield count
        count += 1

使用生成器监听文件输入

def fileTail(filename):
    with open(filename) as f:
        while True:
            tail = f.readline()
            if line:
                yield tail
            else:
                time.sleep(0.1)

计算移动平均值

def averager(start_with = 0):
    count = 0
    aver = start_with
    total = start_with
    while True:
        val = yield aver
        total += val
        count += 1
        aver = total/count

有个弊端，需要通过__next__或next()初始化一次，通过预激解决

预激计算移动平均值

def init(f):
    def wrapper(start_with = 0):
        g_aver = f(start_with)
        g_aver.__next__()
        return g_aver
    return wrapper

@init
def averager(start_with = 0):
    count = 0
    aver = start_with
    total = start_with
    while True:
        val = yield aver
        total += val
        count += 1
        aver = total/count

读取文件字符数最多的行的字符数

最传统的写法：

def longestLine(filename):
    with open(filename, 'r', encoding='utf-8') as f:
        alllines = [len(x.strip()) for x in f]
        return max(alllines)

使用生成器以后的写法：

1 2	def longestLine(filename): return max(len(x.strip()) for x in open(filename))

多生成器迭代

>>> g = (i for i in range(5))
>>> for j in g:
...   print(j)
...
0
1
2
3
4
>>> for j in g:
...   print(j)
...
>>>

因为for j in g，每次循环执行一次g.next()；直到结束，触发StopIteration；

主意下面结果的输出：

>>> g = (i for i in range(4))
>>> g1 = (x for x in g)
>>> g2 = (y for y in g1)
>>>
>>> print(list(g1))
[0, 1, 2, 3]
>>> print(list(g2))
[]
>>>

为什么print(list(g2))为空呢？理一下，不然会乱：

看下面的代码：

def g():
    print('1.1')
    for i in range(2):
        print('1.2')
        yield i
        print('1.3')

def g1():
    print('2.1')
    for x in s:
        print('2.2')
        yield x
        print('2.3')

def g2():
    print('3.1')
    for y in s1:
        print('3.2')
        yield y
        print('3.3')

s = g()
s1 = g1()
s2 = g2()
print('start first list')
print(list(s1))
print('start second list')
print(list(s2))

结果：

start first list
2.1
1.1
1.2
2.2
2.3
1.3
1.2
2.2
2.3
1.3
[0, 1]
start second list
3.1
[]

注意第11行之后，g触发了StopIteration，被for x in s捕捉，即不能继续s.__next__()了；同样的g1触发StopIteration，被list捕捉，即不能继续s1.__next__()了；于是打印[0,1]

当进行print(list(s2))时，执行s2.__next__()，停留在代码的第17行for y in s1，但是这是不能继续s1.__next__()了；于是直接触发了StopIteration；结果为[]

再看一个有意思的输出：

def add(n,i):
    return n+i

g = (i for i in range(4))

for n in [1,10]:
    g = (add(n,i) for i in g)

print(list(g))

输出为：[20, 21, 22, 23]

其实上面的代码翻译如下：

def add(n,i):
    return n+i

def g1():
    for i in g:
        yield add(n,i)

def g2():
    for i in s1:
        yield add(n,i)

n = 1
s1 = g1()
n = 10
s2 = g2()
print(list(s2))

最终n用的是10,