问题描述
我需要实现数据从动态图形刮任务,并提取数据。图为随时间类似,如果你看一个公司的股票的图表,你会发现什么更新。我使用的请求,并在python beautifulsoup4库,但我只是想出如何刮文本和链接数据。似乎无法弄清楚如何我可以得到图的值到CSV文件
I need to implement a data scraping task and extract data from a dynamic graph. The graph is update with time similar to what you would find if you look at the graph of a company's stock. I am using the requests and beautifulsoup4 library in python but I have only figured out how to scrape text and links data. Can't seem to figure out how i can get the values of the graph into a csv file
问题的图表可以发现 - 的
The graph in question can be found at - http://www.apptrace.com/app/instagram/id389801252/ranks/topfreeapplications/36
推荐答案
@Oliver W.已经提供了一个很好的答案,但使用要求
(),避免了要注意网络呼叫和整体是一个好得多的包比的urllib
。
@Oliver W. provided a good answer already, but using requests
(link here) avoids having to note the network call and is overall a much nicer package than urllib
.
如果你想多一点灵活的与code,你可以写一个函数,它的国名,并开始和结束日期。
If you wanna be a bit more flexible with your code, you can write a function that takes the country name and start and end date.
import requests
import pandas as pd
import json
def load_data(country='', start_date='2014-08-09', end_date='2014-11-1'):
base = "http://www.apptrace.com/api/app/389801252/rankings/country/"
extra = "?country={0}&start_date={1}&end_date={2}&device=iphone&list_type=normal&chart_subtype=iphone"
addr = base + extra.format(country, start_date, end_date)
page = requests.get(addr)
json_data = page.json() #gets the json data from the page
ranks = json_data['rankings'][0]['ranks']
ranks = json.dumps(ranks) #Ensures it has valid json format
df = pd.read_json(ranks, orient='records')
return df
在网页中改变的东西,看看有什么其他价值可以从国家得到(加拿大是'可以'为例)。空字符串是美国。
Change things in the webpage to see what other values you can get from country (Canada is 'CAN' for example). The empty string is for the USA.
东风看起来像这样
date rank
0 2014-08-09 10
1 2014-08-10 10
2 2014-08-11 9
3 2014-08-12 8
4 2014-08-13 8
5 2014-08-14 7
6 2014-08-15 6
7 2014-08-16 8
在手的大熊猫数据帧,可以导出到 CSV
或导出之前,结合大量的dataframes
With the pandas dataframe in hand, you can export to csv
or combine many dataframes before you export
df = load_data()
df.to_csv("file_name.csv")
这篇关于使用python + beautifulSoup4从动态图形刮数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!