本文介绍了使用 BeautifulSoup 或 LXML.HTML 进行网页抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我看过一些网络广播,在尝试执行此操作时需要帮助:我一直在使用 lxml.html.雅虎最近改变了网络结构.
I have seen some webcasts and need help in trying to do this:I have been using lxml.html. Yahoo recently changed the web structure.
目标页面;
http://finance.yahoo.com/quote/IBM/options?date=1469750400&straddle=true
在 Chrome 中使用检查器:我在
In Chrome using inspector: I see the data in
//*[@id="main-0-Quote-Proxy"]/section/section/div[2]/section/section/table
然后一些更多的代码
如何将这些数据放入列表中.我想将其他股票从LLY"更改为Msft"?
我如何在日期之间切换......并获得所有月份.
How Do get this data out into a list.I want to change to other stock from "LLY" to "Msft"?
How do I switch between dates....And get all months.
推荐答案
基于@hoju 的答案:
Basing the Answer on @hoju:
import lxml.html
import calendar
from datetime import datetime
exDate = "2014-11-22"
symbol = "LLY"
dt = datetime.strptime(exDate, '%Y-%m-%d')
ym = calendar.timegm(dt.utctimetuple())
url = 'http://finance.yahoo.com/q/op?s=%s&date=%s' % (symbol, ym,)
doc = lxml.html.parse(url)
table = doc.xpath('//table[@class="details-table quote-table Fz-m"]/tbody/tr')
rows = []
for tr in table:
d = [td.text_content().strip().replace(',','') for td in tr.xpath('./td')]
rows.append(d)
print rows
这篇关于使用 BeautifulSoup 或 LXML.HTML 进行网页抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!