问题描述
我一直在使用 scikits.statsmodels OLS预测函数可预测拟合数据,但现在希望转而使用熊猫.
I have been using the scikits.statsmodels OLS predict function to forecast fitted data but would now like to shift to using Pandas.
文档指的是OLS 以及名为 y_predict 的函数但是我找不到任何有关如何正确使用它的文档.
The documentation refers to OLS as well as to a function called y_predict but I can't find any documentation on how to use it correctly.
通过示例:
exogenous = {
"1998": "4760","1999": "5904","2000": "4504","2001": "9808","2002": "4241","2003": "4086","2004": "4687","2005": "7686","2006": "3740","2007": "3075","2008": "3753","2009": "4679","2010": "5468","2011": "7154","2012": "4292","2013": "4283","2014": "4595","2015": "9194","2016": "4221","2017": "4520"}
endogenous = {
"1998": "691", "1999": "1580", "2000": "80", "2001": "1450", "2002": "555", "2003": "956", "2004": "877", "2005": "614", "2006": "468", "2007": "191"}
import numpy as np
from pandas import *
ols_test = ols(y=Series(endogenous), x=Series(exogenous))
但是,尽管我可以进行拟合:
However, while I can produce a fit:
>>> ols_test.y_fitted
1998 675.268299
1999 841.176837
2000 638.141913
2001 1407.354228
2002 600.000352
2003 577.521485
2004 664.681478
2005 1099.611292
2006 527.342854
2007 430.901264
预测没有什么不同:
>>> ols_test.y_predict
1998 675.268299
1999 841.176837
2000 638.141913
2001 1407.354228
2002 600.000352
2003 577.521485
2004 664.681478
2005 1099.611292
2006 527.342854
2007 430.901264
在scikits.stats模型中,可以执行以下操作:
In scikits.statsmodels one would do the following:
import scikits.statsmodels.api as sm
...
ols_model = sm.OLS(endogenous, np.column_stack(exogenous))
ols_results = ols_mod.fit()
ols_pred = ols_mod.predict(np.column_stack(exog_prediction_values))
我该如何在Pandas中将内源数据预测到外源数据的极限?
How do I do this in Pandas to forecast the endogenous data out to the limits of the exogenous?
更新:感谢Chang,新版本的Pandas(0.7.3)现在已将此功能作为标准功能.
UPDATE: Thanks to Chang, the new version of Pandas (0.7.3) now has this functionality as standard.
推荐答案
您的问题是如何获取回归的预测y值吗?还是使用回归系数来为外生变量获取一组不同样本的y预测值?熊猫y_predict和y_fitted应该给您相同的值,并且都应该给您与scikits.statsmodels模型中的预测方法相同的值.
is your issue how to get the predicted y values of your regression? Or is it how to use the regression coefficients to get predicted y values for a different set of samples for the exogenous variables? pandas y_predict and y_fitted should give you the same value and both should give you the same values as the predict method in scikits.statsmodels.
如果您要寻找回归系数,请执行ols_test.beta
If you're looking for the regression coefficients, do ols_test.beta
这篇关于使用Pandas OLS进行预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!