我试着从一个网站上提取数据,已经找了好几个星期了。我在努力
from bs4 import BeautifulSoup as Soup
req = requests.get('http://www.rushmore.tv/schedule')
soup = Soup(req.text, "html.parser")
soup.find('home-section-wrap center', id="section-home")
print soup.find
但它返回了一些和蒸汽有关的东西,这完全是随机的,因为我所做的一切都与蒸汽有关。
<bound method BeautifulSoup.find of \n<td class="listtable_1" height="16">\n<a href="http://steamcommunity.com/profiles/76561198134729239" target="_blank">\n 76561198134729239\n </a>\n</td>>
我要做的是刮一个div ID并打印内容。非常新。干杯
最佳答案
使用这个:
import requests
from bs4 import BeautifulSoup
r = requests.get('http://www.rushmore.tv/schedule')
soup = BeautifulSoup(r.text, "html.parser")
for row in soup.find('ul', id='myUL').findAll('li'):
print(row.text)
部分输出:
10:30 - 13:30 Olympics: Women's Curling, Canada vs China (CA Coverage) - Channel 21
10:30 - 11:30 Olympics: Freestyle, Men's Half Pipe (US Coverage) - Channel 34
11:30 - 14:45 Olympics: BBC Coverage - Channel 92
11:30 - 19:30 Olympics: BBC Red Button Coverage - Channel 103
11:30 - 13:30 Olympics: Women's Curling, Great Britain vs Japan - Channel 105
13:00 - 15:30 Olympics: Men's Ice Hockey: Slovenia vs Norway - Channel 11
13:30 - 15:30 Olympics: Men's Ice Hockey: Slovenia vs Norway (JIP) - Channel 21
13:30 - 21:30 Olympics: DE Coverage - Channel 88
14:45 - 18:30 Olympics: BBC Coverage - Channel 91