问题描述
假设我想回归总收入的 R 毛利润.为此我需要数据,而且越多越好.我发现 CRAN 上有一个非常有用的库: quantmod ,可以满足我的需求.
Suppose I want to regress in R Gross Profit on Total Revenue. I need data for this, and the more, the better.There is a library on CRAN that I find very useful: quantmod , that does what I need.
library(quantmod)
getFinancials(Symbol="AMD", src="google")
#to get the names of the matrix: rownames(AMD.f$IS$A)
Total.Revenue<-AMD.f$IS$A["Revenue",]
Gross.Profit<-AMD.f$IS$A["Gross Profit",]
#finally:
reg1<-lm(Gross.Profit~Total.Revenue)
我遇到的最大问题是这个库只为我提供了 4 年的数据(4 个观察,谁运行了只有 4 个观察的回归???).有没有其他方法(可能是其他图书馆)可以获得超过 4 年的数据?
The biggest issue that I have is that this library gets me data only for 4 years (4 observations, and who runs a regression with only 4 observations???). Is there any other way (maybe other libraries) that would get data for MORE than 4 years?
推荐答案
我同意这不是 R 编程问题,但无论如何我都会在这个问题(可能)结束之前发表一些评论.
I agree that this is not an R programming question, but I'm going to make a few comments anyway before this question is (likely) closed.
归结为:即使您有钱花,也很难获得跨行业和市场的可靠基本数据.如果您正在考虑美国,那么有多种选择,但所有主要(读作相对可靠")提供商每月都需要数千美元 - FactSet、Bloomberg、Datastream 等.就其价值而言,为了处理基本数据,我更喜欢并使用 FactSet.
It boils down to this: getting reliable fundamental data across sectors and markets is difficult enough even if you have money to spend. If you are looking at the US then there are a number of options, but all the major (read 'relatively reliable') providers require thousands of dollars per month - FactSet, Bloomberg, Datastream and so on. For what it's worth, for working with fundamental data I prefer and use FactSet.
一般来说,因为每个提供商提供的 Excel 工具都比较成熟,我发现用数据填充电子表格然后将数据读入 R 更容易.再说一次,我通常处理几十个基础知识公司最多,因为一旦您离开已知"公司的领域,检查异常所花费的时间就会成倍增加.
Generally speaking, because the Excel tools offered by each provider are more mature, I have found it easier to populate spreadsheets with the data and then read the data into R. Then again, I typically deal with the fundamentals of a few dozen companies at most, because once you move out of the domain of your "known" companies the time it takes to check anomalies increases exponentially.
有许多潜在的陷阱".最明显的是,定义因部门而异.例如,工业公司的销售"与银行的销售"非常不同.另一个问题是定义的变化.几乎每年都会有一些会计法规或其他变化并破坏您的数据系列.去年在此处报告了少数群体,但今年该项目被移至损益表中的另一个位置,依此类推.
There are numerous potential "gotchas". The most obvious is that definitions vary from sector to sector. "Sales" for an industrial company is very different from "sales" for a bank, for example. Another problem is changes in definitions. Pretty much every year some accounting regulation or other changes and breaks your data series. Last year minorities were reported here, but this year this item is moved to another position in the P&L and so on.
另一个问题是公司自身的变化.例如,如何处理兼并、收购和分拆?这类事情几乎不可能衡量有机销售增长.还有一点需要牢记,如果您要处理营业利润或净利润,您必须考虑例外情况以及是否对其进行调整.
Another problem is companies themselves changing. How does one deal with mergers, acquisitions and spin-offs, for example? This sort of thing can make measuring organic sales growth next to impossible. Yet another point to bear in mind is that if you're dealing with operating or net profit, you have to consider exceptionals and whether to adjust for them.
与美国以外的公司打交道又增加了一大堆问题.当然,主要的数据提供商试图在全球范围内进行标准化(例如 FactSet Fundamentals).这只是增加了另一层抽象,通常很难检查数据是如何被操纵的.
Dealing with companies outside the US adds a whole bunch of further problems. Of course, the major data providers try to standardise globally (FactSet Fundamentals for example). This just adds another layer of abstraction and typically it is hard to check to see how the data has been manipulated.
简而言之,获取数据很麻烦,而且我知道没有可靠的免费资源.除非您处理的是非常同质的公司集团的最简单的项目,否则即使您确实拥有数据,这也是一大堆蠕虫.
In short, getting the data is onerous and I know of no reliable free sources. Unless you're dealing with the simplest items for a very homogenous group of companies, this is a can of worms even if you do have the data.
这篇关于在 R 中获取股票多年的年度财务数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!