问题描述
我正在使用 statsmodels 公式 API(来自 patsy)构建 WLS (statsmodels.formula.api.wls
) 模型,并且我正在使用因素之间的相互作用.其中一些是预测性的,而另一些则不是.有没有办法只在模型中包含交互的一个子集,而无需手动构建设计矩阵?
I'm building a WLS (statsmodels.formula.api.wls
) model using the statsmodels formulas API (from patsy) and I'm using interactions between factors. Some of these are predictive whereas others are not. Is there a way to include only a subset of the interactions in the model without resorting to building a design matrix by hand?
或者,有没有办法将模型变量子集的估计系数限制为零?
Alternatively, is there a way to constrain the estimated coefficients of a subset of the model variables to be equal to zero?
推荐答案
我不确定我是否完全理解您需要什么,但我建议您从真正出色的糊状文档(patsy 处理 statsmodels 的公式)开始.关于分类数据有一个很好的部分:http://patsy.readthedocs.org/en/latest/index.html
I'm not sure I understand exactly what you need, but I suggest you start with the truly excellent pasty docs (patsy handles formulas for statsmodels). There's a nice section on categorical data: http://patsy.readthedocs.org/en/latest/index.html
我的猜测是,通过单个公式调用将很难实现您想要的.我可能只是使用 patsy 来构建一个比我需要的更多项的设计矩阵,然后删除列.例如:
My guess is that what you want is going to be hard to achieve with a single formula call. I would probably just use patsy to build a design matrix with more terms than I need and then drop columns. For example:
In [28]: import statsmodels.formula.api as sm
In [29]: import pandas as pd
In [30]: import numpy as np
In [31]: import patsy
In [32]: url = "http://vincentarelbundock.github.com/Rdatasets/csv/HistData/Guerry.csv"
In [33]: df = pd.read_csv(url)
In [34]: w = np.ones(df.shape[0])
In [35]: f = 'Lottery ~ Wealth : C(Region)'
In [36]: y,X = patsy.dmatrices(f, df, return_type='dataframe')
In [37]: X.head()
Out[37]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns:
Intercept 5 non-null values
Wealth:C(Region)[nan] 5 non-null values
Wealth:C(Region)[C] 5 non-null values
Wealth:C(Region)[E] 5 non-null values
Wealth:C(Region)[N] 5 non-null values
Wealth:C(Region)[S] 5 non-null values
Wealth:C(Region)[W] 5 non-null values
dtypes: float64(7)
In [38]: X = X.ix[:,[2,3,4]]
In [39]: X.head()
Out[39]:
Wealth:C(Region)[C] Wealth:C(Region)[E] Wealth:C(Region)[N]
0 0 73 0
1 0 0 22
2 61 0 0
3 0 76 0
4 0 83 0
In [40]: mod = sm.WLS(y, X, 1./w).fit()
In [41]: mod.params
Out[41]:
Wealth:C(Region)[C] 1.084430
Wealth:C(Region)[E] 0.650396
Wealth:C(Region)[N] 1.021582
这篇关于Statsmodels 公式 API (patsy):如何排除交互组件的子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!