问题描述
我是网络爬虫的新手.因此,我承担了从以下任务中提取数据的任务:
您可以在预览"标签中检索此URL中的数据,您可以查看所有数据.
如果您对python有很好的了解,也可以使用它来抓取数据
https://doc.scrapy.org/en/latest/intro/overview.html
I am new to webscraping. So I have been given a task to extract data from : Here
I am choosing dataset of "comments". Below is my code for scraping.
import requests
from bs4 import BeautifulSoup
url = 'https://www.kaggle.com/hacker-news/hacker-news'
headers = {'User-Agent' : 'Mozilla/5.0'}
response = requests.get(url, headers = headers)
response.status_code
response.content
soup = BeautifulSoup(response.content, 'html.parser')
soup.find_all('tbody', class_ = 'TableBody-kSbjpE jGqIxa')
When I try to execute the last command it returns : []
.
So, I am stuck here. I know we can get the data from kernel, but just for practice purpose where am I going wrong? Am I choosing wrong class? I want to scrape the data and probably save it to a CSV file or to a No-SQL Database, preferred Cassandra.
you are getting this [] because data you want to scrape is coming from API which loads after you web page load so page you are accessing does not contain that class
you can open you browser console and check in network as given in screenshot there you find data you want to scrape so you have to make request to that URL to get data
you can retrive data in this URL in preview tab you can see all data.
also if you have good knowledge of python you can also use this to scrape data
https://doc.scrapy.org/en/latest/intro/overview.html
这篇关于美丽汤返回空列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!