问题描述
我想使用 Python 抓取以下 url 的一些数据.http://www.hankyung.com/stockplus/main.php?module=stock&mode=stock_analysis_infomation&itemcode=078340
这是关于公司信息的摘要.
我要抓取的内容没有显示在第一页上.通过单击名为재무제표"的选项卡,您可以访问财务报表.然后点击名为현금흐름표"的标签,您可以访问现金流".
我想抓取现金流"数据.
但是,现金流数据是由 javascript 跨 url 生成的.以下链接是隐藏的网址,http://stock.kisline.com/compinfo/financial/main.action?vhead=N&vfoot=N&vstay=&omit=&vwidth=>
现金流数据是通过向这个 url 提交一些选项值和 cookie 来生成的.
如您所见,第一个链接中的 itemcode=078340 表示股票代码,我想收集现金流数据的股票多达 1680 只.我想让它成为一个循环结构.
有什么好的方法可以抓取现金流数据?我尝试了scrapy,但scrapy 很难处理我已经在使用的另一个抓取代码.
还有 dryscape(一个图书馆由我写的,所以建议有点偏颇,显然:) 它使用基于 Webkit 的快速内存浏览器来导航.它也能理解 Javascript,但比 Selenium 轻得多.
I want to scrape some data of following url using Python.http://www.hankyung.com/stockplus/main.php?module=stock&mode=stock_analysis_infomation&itemcode=078340
It's about a summary of company information.
What I want to scrape is not shown on the first page.By clicking tab named "재무제표", you can access financial statement. And clicking tab named "현금흐름표', you can access "Cash Flow".
I want to scrape the "Cash Flow" data.
However, Cash flow data is generated by javascript across the url.The following link is that url which is hidden, http://stock.kisline.com/compinfo/financial/main.action?vhead=N&vfoot=N&vstay=&omit=&vwidth=
Cash flow data is generated by submitting some option value and cookie to this url.
As you perceived, itemcode=078340 in the first link means stock code and there are as many as 1680 stocks that I want gather cash flow data. I want make it a loop structure.
Is there good way to scrape cash flow data?I tried scrapy but scrapy is difficult to cope with my another scraping code already I'm using.
There's also dryscape (a library written by me, so the recommendation is a bit biased, obviously :) which uses a fast Webkit-based in-memory browser to navigate around. It understands Javascript, too, but is a lot more lightweight than Selenium.
这篇关于使用 Python 抓取 JavaScript 生成的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!