问题描述
我在 R 中实现了一个新的统计模型,它在我的沙箱中工作,但我想让它更标准.一个很好的比较是 lm()
,我可以在其中获取模型对象并:
I have implemented a new statistical model in R and it works in my sandbox, but I would like to make it more standard. A good comparison is lm()
, where I can take a model object and:
- 应用
summary()
函数 - 提取模型的系数
- 从拟合(训练)数据中提取残差
- 更新模型
- 应用
predict()
函数 - 将
plot()
应用到预先选择的描述图 - 享受许多其他的快乐
- apply the
summary()
function - extract the coefficients of the model
- extract residuals from the fitted (training) data
- update the model
- apply the
predict()
function - apply
plot()
to pre-selected descriptive plots - engage in many other kinds of joy
我浏览了 R 手册、在线搜索并翻阅了几本书,而且,除非我忽略了某些东西,否则我找不到关于应该放入新模型包的内容的好教程.
I've looked through the R manuals, searched online, and thumbed through several books, and, unless I'm overlooking something, I can't find a good tutorial on what should go into a new model package.
尽管我对详尽的参考资料或指南最感兴趣,但我将在这篇文章中重点关注一个包含两个部分的问题:
Although I'm most interested in thorough references or guides, I'll keep this post focused on a question with two components:
- 通常预期模型对象中的关键组件是什么?
- 建模包中通常实现的典型功能有哪些?
答案可能来自 R Core(或包开发人员)的角度,也可能来自用户的角度,例如用户希望能够使用汇总、预测、残差、系数等函数,并且通常希望在拟合模型时传递公式.
Answers could be from the R Core (or package developers) perspective or from the perspective of users, e.g. users expect to be able to use functions like summary, predict, residuals, coefficients, and often expect to pass a formula when fitting a model.
推荐答案
将您认为有用和必要的内容放入对象中.我认为一个更重要的问题是你如何包含这些信息,以及如何访问它.
Put into the object what you think is useful and necessary. I think a more important Question is how do you include this information, as well as how one accesses it.
至少提供一个 print()
方法,以便在打印对象时不会将整个对象转储到屏幕上.如果您提供 summary()
方法,则约定是让该对象返回 summary.foo
类的对象(其中 foo
是您的class) 然后提供一个 print.summary.foo()
方法 --- 您不希望 summary()
方法本身进行任何打印.
At a minimum, provide a print()
method so the entire object doesn't get dumped to the screen when you print the object. If you provide a summary()
method, the convention is to have that object return an object of class summary.foo
(where foo
is your class) and then provide a print.summary.foo()
method --- you don't want your summary()
method doing any printing in and of itself.
如果您有系数、拟合值和残差并且这些很简单,那么您可以将它们存储在返回的对象中作为 $coefficients
、$fitted.values
和 $residuals
分别.然后 coef()
、fitted()
和 resid()
的默认方法将起作用,而无需您添加自己的定制方法.如果这些不简单,那么为您的类提供您自己的 coef()
、fitted.values()
和 residuals()
方法.不简单,我的意思是,例如,如果有多种类型的残差,并且您需要处理存储的残差以获得请求的类型 --- 那么您需要自己的方法,该方法采用 type
参数或类似的从可用的残差类型中选择.有关示例,请参见 ?residuals.glm
.
If you have coefficients, fitted values and residuals and these are simple, then you can store them in your returned object as $coefficients
, $fitted.values
and $residuals
respectively. Then the default methods for coef()
, fitted()
and resid()
will work without you needing to add your own bespoke methods. If these are not simple, then provide your own methods for coef()
, fitted.values()
and residuals()
for your class. By not simple, I mean, for example, if there are several types of residual and you need to process the stored residuals to get the requested type --- then you need your own method that takes a type
argument or similar to select from the available types of residual. See ?residuals.glm
for an example.
如果可以提供有用的预测,则可以提供 predict()
方法.例如,查看 predict.lm()
方法以了解应该采用哪些参数.同样,如果通过添加/删除项或更改模型参数来更新模型是有意义的,则可以提供 update()
.
If predictions are something that can be usefully provided, then a predict()
method could be provided. Look at the predict.lm()
method for example to see what arguments should be taken. Likewise, an update()
can be provided if it makes sense to update the model by adding/removing terms or altering model parameters.
plot.lm()
给出了一个方法示例,该方法提供了拟合模型的多个诊断图.您可以根据该函数为您的方法建模,以从一组预定义的诊断图中进行选择.
plot.lm()
gives an example of a method that provides several diagnostics plots of the fitted model. You could model your method on that function to select from a set of predefined diagnostics plots.
如果你的模型有一个可能性,那么提供一个 logLik()
方法来计算或从拟合的模型对象中提取它是标准的,deviance()
是另一个类似的功能,如果这样的事情是相关的.对于参数的置信区间,confint()
是标准方法.
If your model has a likelihood, then providing a logLik()
method to compute or extract it from the fitted model object would be standard, deviance()
is another similar function if such a thing is pertinent. For confidence intervals on parameters, confint()
is the standard method.
如果你有一个公式接口,那么formula()
方法可以提取它.如果您将其存储在默认方法搜索的位置,那么您的生活将变得更轻松.存储它的一种简单方法是将匹配的调用 (match.call()
) 存储在 $call
组件中.提取作为数据的模型框架 (model.frame()
) 和模型矩阵 (model.matrix()
) 的方法对比,加上模型框架数据的任何转换或函数)模型矩阵是标准的提取器函数.查看标准 R 建模函数的示例,了解有关如何存储/提取此信息的想法.
If you have a formula interface, then formula()
methods can extract it. If you store it in a place that the default method searches for, then your life will be made easier. A simple way to store this is to store the matched call (match.call()
) in the $call
component. Methods to extract the model frame (model.frame()
) and model matrix (model.matrix()
) that are the data and the expanded (factors converted to variables using contrasts, plus any transformations or functions of the model frame data) model matrix are standard extractor functions. Look at examples from standard R modelling functions for ideas on how to store/extract this information.
如果您确实使用公式接口,请尝试遵循大多数具有公式接口/方法的 R 模型对象中使用的标准、非标准评估方法.您可以在 R 开发人员 页面上找到详细信息,尤其是 文档 作者:Thomas Lumley.这为使您的函数像人们期望的 R 建模函数那样工作提供了大量建议.
If you do use a formula interface, try to follow the standard, non-standard evaluation method used in most R model objects that have a formula interface/method. You can find details of that on the R Developer page, in particular the document by Thomas Lumley. This gives plenty of advice on making your function work like one expects an R modelling function to work.
如果您遵循此范例,那么如果您遵循标准(非标准)规则,像 na.action()
这样的提取器应该可以正常工作.
If you follow this paradigm, then extractors like na.action()
should just work if you follow the standard (non-standard) rules.
这篇关于R 中标准模型对象的关键组件和功能是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!