问题描述
我迷惑不解。我有HTML块,我掏出一个更大的表。它看起来大约是这样的:
I'm thoroughly puzzled. I have a block of HTML that I scraped out of a larger table. It looks about like this:
<td align="left" class="page">Number:\xc2\xa0<a class="topmenu" href="http://www.example.com/whatever.asp?search=724461">724461</a> Date:\xc2\xa01/1/1999 Amount:\xc2\xa0$2.50 <br/>Person:<br/><a class="topmenu" href="http://www.example.com/whatever.asp?search=LAST&searchfn=FIRST">LAST,\xc2\xa0FIRST </a> </td>
(事实上,它看起来更糟,但我regexed了大量换行符)
(Actually, it looked worse, but I regexed out a lot of line breaks)
我需要得到这些行,并打破了日期/金额一致。这似乎是开始的地方是找到HTML的该块的孩子。该区块是一个字符串,因为这是正则表达式怎么还给了我。所以,我所做的:
I need to get the lines out, and break up the Date/Amount line. It seemed like the place to start was to find the children of that block of HTML. The block is a string because that's how regex gave it back to me. So I did:
text_soup = BeautifulSoup(text)
text_children = text_soup.find('td').childGenerator()
我可以通过与
for i,each in enumerate(text_soup.find('td').childGenerator()):
print type(each)
print i, ":", each
,但不与
for i, each in enumerate(text_children):
...etc
这些应该是相同的。所以我很困惑。
These ought to be the same. So I'm confused.
推荐答案
gnibbler是在解释,你可以只消耗发电机一次正确的。只是为了进一步阐述:
gnibbler is correct in explaining that you could only consume generators once. Just to expound further:
按照一个中的iterator
是一个对象重新presenting数据流。既然你已经使用的流(即到达流的末尾),重申了它不会产生任何数据。我有同样的问题出现,但卡尔Knechtel的<一个href=\"http://stackoverflow.com/questions/10103107/python-sum-not-working-in-list-com$p$phension-syntax-if-the-source-is-file/10103584#10103584\">comment茅塞顿开我。希望我的解释是明确的。
According to the docs an iterator
is an object representing a stream of data. Since you already consumed the stream (i.e you reach the end of the stream), reiterating over it will not yield any data. I had the same problem before but Karl Knechtel's comment cleared things up for me. Hope my explanation is clear.
这篇关于这是为什么的ListIterator卡?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!