问题描述
我是 Python 的初学者,正在尝试创建一个程序,该程序将从 skysports.com 上抓取足球/足球赛程,并通过 Twilio 通过 SMS 将其发送到我的手机.我已经排除了 SMS 代码,因为我已经弄清楚了,所以这是到目前为止我遇到的网络抓取代码:
导入请求从 bs4 导入 BeautifulSoupURL = "https://www.skysports.com/football-fixtures";页面 = requests.get(URL)结果 = BeautifulSoup(page.content, html.parser")d = defaultdict(列表)comp = results.find('h5', {"class": "fixres__header3"})team1 = results.find('span', {class": matches__item-colmatches__participantmatches__participant--side1"})date = results.find('span', {"class": "matches__date"})team2 = results.find('span', {"class": "matches__item-colmatches__participantmatches__participant--side2"})对于范围内的 ind(len(d)):d['comp'].append(comp[ind].text)d['team1'].append(team1[ind].text)d['date'].append(date[ind].text)d['team2'].append(team2[ind].text)
下面应该可以为您解决问题:
from bs4 import BeautifulSoup进口请求a = requests.get('https://www.skysports.com/football-fixtures')汤 = BeautifulSoup(a.text,features="html.parser")团队 = []for date in soup.find_all(class_=fixres__header2"): # 在那个日期搜索for i in soup.find_all(class_=swap-text--bp30")[1:]: #跳过第一个因为这是一个标题team.append(i.text)日期 = 汤.find(class_=fixres__header2").text打印(日期)team = [i.strip('\n') for i in team]对于范围内的 x(0,len(teams),2):打印(teams[x]+"vs+teams[x+1])
让我进一步解释我所做的:所有的足球都有这个类名 - swap-text--bp30
所以我们可以使用
为了获得联赛的冠军,我们会做几乎相同的事情:
联赛 = []for date in soup.find_all(class_=fixres__header2"): # 在那个日期搜索for i in soup.find_all(class_="fixres__header3"): #skips the first one 因为那是标题League.append(i.text)
剥离数组并创建另一个:
league = [i.strip('\n') for i in League]最终 = []
然后添加最后一点代码,它基本上只是一遍又一遍地打印联赛然后两支球队:
for x in range(0,len(teams),5):final.append(teams[x]+" vs "+teams[x+1])因为我在联盟:打印(一)因为我在最后:打印(一)
I'm a beginner to Python and am trying to create a program that will scrape the football/soccer schedule from skysports.com and will send it through SMS to my phone through Twilio. I've excluded the SMS code because I have that figured out, so here's the web scraping code I am getting stuck with so far:
import requests
from bs4 import BeautifulSoup
URL = "https://www.skysports.com/football-fixtures"
page = requests.get(URL)
results = BeautifulSoup(page.content, "html.parser")
d = defaultdict(list)
comp = results.find('h5', {"class": "fixres__header3"})
team1 = results.find('span', {"class": "matches__item-col matches__participant matches__participant--side1"})
date = results.find('span', {"class": "matches__date"})
team2 = results.find('span', {"class": "matches__item-col matches__participant matches__participant--side2"})
for ind in range(len(d)):
d['comp'].append(comp[ind].text)
d['team1'].append(team1[ind].text)
d['date'].append(date[ind].text)
d['team2'].append(team2[ind].text)
Down below should do the trick for you:
from bs4 import BeautifulSoup
import requests
a = requests.get('https://www.skysports.com/football-fixtures')
soup = BeautifulSoup(a.text,features="html.parser")
teams = []
for date in soup.find_all(class_="fixres__header2"): # searching in that date
for i in soup.find_all(class_="swap-text--bp30")[1:]: #skips the first one because that's a heading
teams.append(i.text)
date = soup.find(class_="fixres__header2").text
print(date)
teams = [i.strip('\n') for i in teams]
for x in range(0,len(teams),2):
print (teams[x]+" vs "+ teams[x+1])
Let me further explain what I have done:All the football have this class name - swap-text--bp30
So we can use find_all to extract all the classes with that name.
Once we have our results we can put them into an array "teams = []" then append them in a for loop "team.append(i.text)". ".text" strips the html
Then we can get rid of "\n" in the array by stripping it and printing out each string in the array two by two.This should be your final output:
EDIT: To scrape the title of the leagues we will do pretty much the same:
league = []
for date in soup.find_all(class_="fixres__header2"): # searching in that date
for i in soup.find_all(class_="fixres__header3"): #skips the first one because that's a heading
league.append(i.text)
Strip the array and create another one:
league = [i.strip('\n') for i in league]
final = []
Then add this final bit of code which is essentially just printing the league then the two teams over and over:
for x in range(0,len(teams),5):
final.append(teams[x]+" vs "+ teams[x+1])
for i in league:
print(i)
for i in final:
print(i)
这篇关于使用 bs4 python 进行网页抓取:如何显示足球比赛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!