问题描述
我正在尝试查看是否可以从 WU.com 读取数据表,但由于找不到表而出现类型错误.(这里也是第一次进行网络抓取)还有另一个人有一个非常相似的 stackoverflow 问题
任何提示都有帮助,谢谢.
页面是动态的,这意味着您需要先呈现页面.因此,您需要使用 Selenium 之类的东西来呈现页面,然后您可以使用 pandas .read_html()
拉出表格:
from selenium import webdriver将熊猫导入为 pddriver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')driver.get("https://www.wunderground.com/history/daily/us/wi/milwaukee/KMKE/date/2013-6-26")html = driver.page_source表格 = pd.read_html(html)数据 = 表格 [1]驱动程序关闭()
输出:
打印(数据)时间 温度 ... 沉淀条件0 6:52 PM 68 F ... 0.0 多云1 7:52 PM 69 F ... 0.0 in 多云2 8:52 PM 70 F ... 0.0 in 多云3 9:52 PM 67 F ... 0.0 多云4 10:52 PM 65 F ... 0.0 多云5 11:42 PM 66 F ... 0.0 in 多云6 11:52 PM 68 F ... 0.0 in 多云7 12:08 AM 68 F ... 0.0 阴天8 12:52 AM 68 F ... 0.0 in 多云9 1:52 AM 70 F ... 0.0 阴天10 2:13 AM 70 F ... 0.0 阴天11 2:52 AM 71 F ... 0.0 阴天12 3:52 AM 70 F ... 0.0 in 多云13 4:19 AM 70 F ... 0.0 阴天14 4:29 AM 70 F ... 0.0 阴天15 4:52 AM 70 F ... 0.0 阴天16 5:25 AM 70 F ... 0.0 in 多云17 5:52 AM 71 F ... 0.0 阴天18 6:52 AM 73 F ... 0.0 阴天19 7:52 AM 74 F ... 0.0 阴天20 8:52 AM 73 F ... 0.0 阴天21 9:52 AM 71 F ... 0.0 阴天22 10:52 AM 71 F ... 0.0 阴天23 11:52 AM 70 F ... 0.0 阴天24 12:52 PM 72 F ... 0.0 in 多云25 1:52 PM 70 F ... 0.0 in 多云26 2:52 PM 71 F ... 0.0 in 多云27 3:52 PM 71 F ... 0.0 多云28 4:52 PM 68 F ... 0.0 in 多云29 5:52 PM 66 F ... 0.0 in 多云[30 行 x 11 列]
I am attempting to see if I can read a table of data from WU.com, but I am getting a type error for no tables found. (first timer on web scraping too here) There is also another person with a very similar stackoverflow question here with WU table of data, but the solution is a little bit complicated to me.
import pandas as pd
df_list = pd.read_html('https://www.wunderground.com/history/daily/us/wi/milwaukee/KMKE/date/2013-6-26')
print(df_list)
On the webpage of historical data for Milwaukee, this is the table of data (daily observations
) that I am attempting to retrieve into Pandas:
Any tips help, thank you.
the page is dynamic which means you'll need to to render the page first. So you would need to use something like Selenium to render the page, then you can pull the table using pandas .read_html()
:
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get("https://www.wunderground.com/history/daily/us/wi/milwaukee/KMKE/date/2013-6-26")
html = driver.page_source
tables = pd.read_html(html)
data = tables[1]
driver.close()
Output:
print (data)
Time Temperature ... Precip Accum Condition
0 6:52 PM 68 F ... 0.0 in Mostly Cloudy
1 7:52 PM 69 F ... 0.0 in Mostly Cloudy
2 8:52 PM 70 F ... 0.0 in Mostly Cloudy
3 9:52 PM 67 F ... 0.0 in Cloudy
4 10:52 PM 65 F ... 0.0 in Partly Cloudy
5 11:42 PM 66 F ... 0.0 in Mostly Cloudy
6 11:52 PM 68 F ... 0.0 in Mostly Cloudy
7 12:08 AM 68 F ... 0.0 in Cloudy
8 12:52 AM 68 F ... 0.0 in Mostly Cloudy
9 1:52 AM 70 F ... 0.0 in Cloudy
10 2:13 AM 70 F ... 0.0 in Cloudy
11 2:52 AM 71 F ... 0.0 in Cloudy
12 3:52 AM 70 F ... 0.0 in Mostly Cloudy
13 4:19 AM 70 F ... 0.0 in Cloudy
14 4:29 AM 70 F ... 0.0 in Cloudy
15 4:52 AM 70 F ... 0.0 in Cloudy
16 5:25 AM 70 F ... 0.0 in Mostly Cloudy
17 5:52 AM 71 F ... 0.0 in Cloudy
18 6:52 AM 73 F ... 0.0 in Cloudy
19 7:52 AM 74 F ... 0.0 in Cloudy
20 8:52 AM 73 F ... 0.0 in Cloudy
21 9:52 AM 71 F ... 0.0 in Cloudy
22 10:52 AM 71 F ... 0.0 in Cloudy
23 11:52 AM 70 F ... 0.0 in Cloudy
24 12:52 PM 72 F ... 0.0 in Mostly Cloudy
25 1:52 PM 70 F ... 0.0 in Mostly Cloudy
26 2:52 PM 71 F ... 0.0 in Mostly Cloudy
27 3:52 PM 71 F ... 0.0 in Partly Cloudy
28 4:52 PM 68 F ... 0.0 in Mostly Cloudy
29 5:52 PM 66 F ... 0.0 in Mostly Cloudy
[30 rows x 11 columns]
这篇关于 pandas read_html - 没有找到表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!