Python中yield关键字的用途是什么?它有什么作用?
我试图理解这段代码1:
def _get_child_candidates(self, distance, min_dist, max_dist):
if self._leftchild and distance - max_dist < self._median:
yield self._leftchild
if self._rightchild and distance + max_dist >= self._median:
yield self._rightchild
Code language: PHP (php)
这是调用者:
result, candidates = [], [self]
while candidates:
node = candidates.pop()
distance = node._get_dist(obj)
if distance <= max_dist and distance >= min_dist:
result.extend(node._values)
candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
return result
Code language: PHP (php)
调用_get_child_candidates
方法时会发生什么?是否返回列表?单一元素?后续调用何时停止?
要了解yield做什么,必须了解Generators是什么。在了解generators之前,你必须了解iterables
iterables
创建列表时,可以一一获取其中的元素 。一项一项地读取它的元素称为迭代:
>>> mylist = [1, 2, 3]
>>> for i in mylist:
... print(i)
Code language: PHP (php)
mylist是可迭代的。可以使用** for… in…**的所有内容都是可迭代的, 包括 list, string, 文件 等
这些可迭代对象使用起来很方便,但是将所有值存储在内存中,当有很多值时可能会不太方便
Generators
Generators是迭代器,一种只能迭代一次的可迭代对象。Generators不会将所有值存储在内存中,它们会即时生成值:
>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
... print(i)
Code language: PHP (php)
效果是一样的,注意使用()而不是[]. 但是,不能执行for i in mygenerator
两次,因为Generators只能使用一次
yield
yield是一个与return 类似的关键字,除了该函数将返回一个Generators。
>>> def create_generator():
... mylist = range(3)
... for i in mylist:
... yield i*i
>>> mygenerator = create_generator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object create_generator at 0xb7555c34>
>>> for i in mygenerator:
... print(i)
Code language: PHP (php)
示例中的函数将返回大量值而只需要读取一次,很方便,对吧
要掌握yield,你必须明白,当你调用函数时,你写在函数体中的代码是不会运行的。该函数只返回Generators对象,然后,代码将在每次for使用Generators时从中断处继续。
现在最困难的部分:
第一次for调用从你的函数创建的Generators对象时,它会从头开始运行你的函数中的代码,直到它命中yield,然后它会返回循环的第一个值。然后,每个后续调用将运行在函数中编写的循环的另一次迭代并返回下一个值。这将一直持续到Generators被认为是空的,这发生在函数运行时没有命中yield. 那可能是因为循环已经结束,或者因为不再满足”if/else”.
本文开头代码解读
Generator:
# Here you create the method of the node object that will return the generator
def _get_child_candidates(self, distance, min_dist, max_dist):
# Here is the code that will be called each time you use the generator object:
# If there is still a child of the node object on its left
# AND if the distance is ok, return the next child
if self._leftchild and distance - max_dist < self._median:
yield self._leftchild
# If there is still a child of the node object on its right
# AND if the distance is ok, return the next child
if self._rightchild and distance + max_dist >= self._median:
yield self._rightchild
# If the function arrives here, the generator will be considered empty
# there is no more than two values: the left and the right children
Code language: PHP (php)
调用者:
# Create an empty list and a list with the current object reference
result, candidates = list(), [self]
# Loop on candidates (they contain only one element at the beginning)
while candidates:
# Get the last candidate and remove it from the list
node = candidates.pop()
# Get the distance between obj and the candidate
distance = node._get_dist(obj)
# If distance is ok, then you can fill the result
if distance <= max_dist and distance >= min_dist:
result.extend(node._values)
# Add the children of the candidate in the candidate's list
# so the loop will keep running until it will have looked
# at all the children of the children of the children, etc. of the candidate
candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
return result
Code language: PHP (php)
此代码包含几个部分:
循环在一个列表上进行迭代,但在迭代循环时列表会扩展。这是一种遍历所有这些嵌套数据的简洁方法,即使它有点危险,因为最终可能会陷入无限循环。在这种情况下,candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))会耗尽Generators的所有值,但while会不断创建新的Generators对象,
这些Generators对象将产生与以前的值不同的值,因为它没有应用在同一个节点上。
该extend()方法是一个列表对象方法,它需要一个可迭代对象并将其值添加到列表中。
通常我们传递一个列表给它:
>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]
Code language: PHP (php)
但是在上面的代码中,它有一个Generators,这很好,因为:
- 不需要读取两次值。
- 可能有很多子元素,并且不希望他们都存储在内存中。
它之所以有效,是因为 Python 不关心方法的参数是否为列表。Python 需要可迭代对象,因此它可以处理字符串、列表、元组和Generators!
下面了解下Generators的高级用法:
控制Generators耗尽
>>> class Bank(): # Let's create a bank, building ATMs
... crisis = False
... def create_atm(self):
... while not self.crisis:
... yield "$100"
>>> hsbc = Bank() # When everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # Crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # It's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # The trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # Build a new one to get back in business
>>> for cash in brand_new_atm:
... print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...
Code language: HTML, XML (xml)
注意:对于 Python 3,使用print(corner_street_atm.next())或print(next(corner_street_atm))
它可以用于控制对资源的访问等各种事情。
Itertools,你最好的朋友
itertools 模块包含操作可迭代对象的特殊函数。想复制一个Generators吗?链接两个Generators?使用单线对嵌套列表中的值进行分组?
那么就import itertools.
让我们看看四马比赛的可能到达顺序:
>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
(1, 2, 4, 3),
(1, 3, 2, 4),
(1, 3, 4, 2),
(1, 4, 2, 3),
(1, 4, 3, 2),
(2, 1, 3, 4),
(2, 1, 4, 3),
(2, 3, 1, 4),
(2, 3, 4, 1),
(2, 4, 1, 3),
(2, 4, 3, 1),
(3, 1, 2, 4),
(3, 1, 4, 2),
(3, 2, 1, 4),
(3, 2, 4, 1),
(3, 4, 1, 2),
(3, 4, 2, 1),
(4, 1, 2, 3),
(4, 1, 3, 2),
(4, 2, 1, 3),
(4, 2, 3, 1),
(4, 3, 1, 2),
(4, 3, 2, 1)]
Code language: PHP (php)
理解的捷径yield
当你看到一个带有yield语句的函数时,应用这个简单的技巧来理解会发生什么:
- 在函数的开头插入
result = []
- 将每个yield expr替换 为
result.append(expr)
。 - 在函数底部插入一行return result
这个技巧会让你了解函数背后的逻辑,但实际发生的事情与yield基于列表的方法中发生的事情有很大不同。在许多情况下,yield 方法的内存效率也会更高,速度也更快。在其他情况下,这个技巧会让你陷入无限循环,即使原始函数工作得很好。