

本文介绍了使用python + beautifulSoup4从动态图形刮数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!


我需要实现数据从动态图形刮任务,并提取数据。图为随时间类似,如果你看一个公司的股票的图表,你会发现什么更新。我使用的请求,并在python beautifulsoup4库,但我只是想出如何刮文本和链接数据。似乎无法弄清楚如何我可以得到图的值到CSV文件

I need to implement a data scraping task and extract data from a dynamic graph. The graph is update with time similar to what you would find if you look at the graph of a company's stock. I am using the requests and beautifulsoup4 library in python but I have only figured out how to scrape text and links data. Can't seem to figure out how i can get the values of the graph into a csv file

问题的图表可以发现 - 的

The graph in question can be found at -


@Oliver W.已经提供了一个很好的答案,但使用要求(),避免了要注意网络呼叫和整体是一个好得多的包比的urllib

@Oliver W. provided a good answer already, but using requests (link here) avoids having to note the network call and is overall a much nicer package than urllib.


If you wanna be a bit more flexible with your code, you can write a function that takes the country name and start and end date.

import requests
import pandas as pd
import json

def load_data(country='', start_date='2014-08-09', end_date='2014-11-1'):
    base = ""
    extra = "?country={0}&start_date={1}&end_date={2}&device=iphone&list_type=normal&chart_subtype=iphone"
    addr = base + extra.format(country, start_date, end_date)

    page = requests.get(addr)
    json_data = page.json() #gets the json data from the page
    ranks = json_data['rankings'][0]['ranks']
    ranks = json.dumps(ranks)  #Ensures it has valid json format
    df = pd.read_json(ranks, orient='records')
    return df


Change things in the webpage to see what other values you can get from country (Canada is 'CAN' for example). The empty string is for the USA.


    date        rank
0   2014-08-09  10
1   2014-08-10  10
2   2014-08-11  9
3   2014-08-12  8
4   2014-08-13  8
5   2014-08-14  7
6   2014-08-15  6
7   2014-08-16  8

在手的大熊猫数据帧,可以导出到 CSV 或导出之前,结合大量的dataframes

With the pandas dataframe in hand, you can export to csvor combine many dataframes before you export

df = load_data()

这篇关于使用python + beautifulSoup4从动态图形刮数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 09:58