本文介绍了网页抓取HTML表使用Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我觉得我真的很接近,所以任何帮助,将AP preciated。试图从标题为股市活动在纳斯达克的网页表格刮索引和值数据:
高清get_index_prices(NASDAQ_URL):
HTML =的urlopen(NASDAQ_URL).read()
汤= BeautifulSoup(HTML,LXML)
在汤行('表',{'类':'genTable薄'})[0] .tbody(TR):
TDS =行('TD')
打印指数:%S,值:%的%(TDS [0]的.text,TDS [1]的.text)
打印get_index_prices('http://www.nasdaq.com/')
错误读取:
解决方案
This table rendered by javascript. If you look on page source code, before javascript runs, you can see this table like:
<div id="HomeIndexTable" class="genTable thin">
<table id="indexTable" class="floatL marginB5px">
<thead>
<tr>
<th>Index</th>
<th>Value</th>
<th>Change Net / %</th>
</tr>
</thead>
<script type="text/javascript">
//<![CDATA[
nasdaqHomeIndexChart.storeIndexInfo("NASDAQ","5053.75","-20.52","0.40","1,938,573,902","5085.22","5053.75");
nasdaqHomeIndexChart.storeIndexInfo("DJIA","17663.54","-92.26","0.52","","17799.96","17662.87");
nasdaqHomeIndexChart.storeIndexInfo("S&P 500","2079.36","-10.05","0.48","","2094.32","2079.34");
nasdaqHomeIndexChart.storeIndexInfo("NASDAQ-100","4648.83","-21.93","0.47","","4681.23","4648.83");
nasdaqHomeIndexChart.storeIndexInfo("NASDAQ-100 PMI","4675.49","4.73","0.10","","4681.98","4675.49");
nasdaqHomeIndexChart.storeIndexInfo("NASDAQ-100 AHI","4647.33","-1.50","0.03","","4670.76","4647.26");
nasdaqHomeIndexChart.storeIndexInfo("Russell 1000","1153.55","-4.85","0.42","","1161.51","1153.54");
nasdaqHomeIndexChart.storeIndexInfo("Russell 2000","1161.86","-3.76","0.32","","1167.65","1159.66");
nasdaqHomeIndexChart.storeIndexInfo("FTSE All-World ex-US*","271.15","-0.23","0.08","","272.33","271.13");
nasdaqHomeIndexChart.storeIndexInfo("FTSE RAFI 1000*","9045.08","-34.52","0.38","","9109.74","9044.91");
//]]>
nasdaqHomeIndexChart.displayIndexes();
</script>
</table>
</div>
You can use selenium for scraping. Selenium can execute javascript.
这篇关于网页抓取HTML表使用Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!