问题描述
我尝试从Finviz取消一些股票关键统计数据.我应用了原始问题中的代码:网络抓取密钥Yahoo!中的统计信息使用R 进行财务.为了收集尽可能多的股票的统计信息,我创建了一个股票代码和描述的列表,如下所示:
I tried to scrap from Finviz for some stock key stats. I applied codes from the original question: Web scraping of key stats in Yahoo! Finance with R. To collect stats for as many stocks as possible I create a list of stock symbols and descriptions like this:
Symbol Description
A Agilent Technologies
AAA Alcoa Corp
AAC Aac Holdings Inc
BABA Alibaba Group Holding Ltd
CRM Salesforce.Com Inc
...
我选择了第一列,并将其作为字符存储在R中,并称其为stock.然后我应用了代码:
I selected out the first column and stored it as a character in R and called it stocks. Then I applied the code:
for (s in stocks) {
url <- paste0("http://finviz.com/quote.ashx?t=", s)
webpage <- readLines(url)
html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
tableNodes <- getNodeSet(html, "//table")
# ASSIGN TO STOCK NAMED DFS
assign(s, readHTMLTable(tableNodes[[9]],
header= c("data1", "data2", "data3", "data4", "data5", "data6",
"data7", "data8", "data9", "data10", "data11", "data12")))
# ADD COLUMN TO IDENTIFY STOCK
df <- get(s)
df['stock'] <- s
assign(s, df)
}
# COMBINE ALL STOCK DATA
stockdatalist <- cbind(mget(stocks))
stockdata <- do.call(rbind, stockdatalist)
# MOVE STOCK ID TO FIRST COLUMN
stockdata <- stockdata[, c(ncol(stockdata), 1:ncol(stockdata)-1)]
但是,对于某些股票,Finviz没有适合他们的页面,我得到这样的错误消息:
However, for some of the stocks, Finviz doesn't have a page for them and I get error massages like this:
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open URL 'http://finviz.com/quote.ashx?t=AGM.A': HTTP status was '404
Not Found'
有很多这种情况的股票,所以我不能从列表中手动删除它们.有没有一种方法可以跳过这些股票的页面?预先感谢!
There are a good number of stocks that have this situation so I can't delete them from my list manually. Is there a way to skip getting the page for those stocks? Thanks in advance!
推荐答案
也许这些行中有内容?在使用forloop之前尝试过滤库存.
Maybe something in these lines? Trying to filter stocks before using your forloop.
library(tidyverse)
#AGM.A should produce error
stocks <- c("AXP","BA","CAT","AGM.A")
urls <- paste0("http://finviz.com/quote.ashx?t=", stocks)
#Test urls with possibly first and find out NAs
temp_ind <- map(urls, possibly(readLines, otherwise = NA_real_))
ind <- map_lgl(map(temp_ind, c(1)), is.na)
ind <- which(ind == TRUE)
filter.stocks <- stocks[-ind]
#AGM.A is removed and you can just insert stocks which work to for loop.
filter.stocks
[1] "AXP" "BA" "CAT"
正如statxiong指出的,url.exist
这里是更简单的版本:
As statxiong pointed out url.exist
here is simpler version:
library(RCurl)
library(tidyverse)
stocks[map_lgl(urls, url.exists)]
这篇关于使用R从Finviz网上抓取股票关键统计数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!