我有一个这样的代码,我试着在h1中获取数据,这里是'the Wire',但我正在获取h1中的所有文本。

<h1 id="aiv-content-title" class="js-hide-on-play">
The Wire
    <span class="num-of-seasons">5 Seasons</span>
    <span class="release-year">2002</span>
</h1>

我得到的输出是Wire5 Seasons2002
heading=elm.find('h1',id='aiv-content-title')
print heading
seasons=elm.find('span',{'class':'num-of-seasons'})

if seasons=='None':
    print '1'
elif seasons!='None':
    print seasons.text

release_year=elm.find('span',{'class':'release-year'})
print release_year.text
print

当我试过这个密码的时候
The Wire5 Seasons20025 Seasons2002
我期待着这样的事情
The Wire5 Seasons2002

最佳答案

您可以执行以下操作:

h1_element = elm.find('h1',{id:'aiv-content-title'})
num_seasons = h1_element.find('span',{'class':'num-of-seasons'}).getText().strip()
release_year = h1_element.find('span',{'class':'release-year'}).getText().strip()

while h1_element.find('span'):
   h1_element.find('span').extract()
   # This will remove the span elements in the h1 element

print h1_element.getText().strip()
print num_seasons
print release_year

10-06 06:18