问题描述
我有一个R包,当前使用S3
类系统,具有两个不同的类和几种用于通用S3函数的方法,例如plot
,logLik
和update
(用于模型公式更新).由于由于没有基于S3
中两个参数的继承或分派的事实,我的代码在所有有效性检查和if/else
结构上都变得更加复杂,因此我开始考虑将程序包转换为S4
.但是后来我开始阅读S3
与S4
的优缺点,因此我不确定.我发现 R-bloggers博客帖子关于以下方面的效率问题: S3 vs S4,就像5年前一样,我现在测试了同一件事:
I have an R package which currently uses S3
class system, with two different classes and several methods for generic S3 functions like plot
, logLik
and update
(for model formula updating). As my code has become more complex with all the validity checking and if/else
structures due to to the fact that there's no inheritance or dispatching based on two arguments in S3
, I have started to think of converting my package to S4
. But then I started to read about the advantages and and disadvantages of S3
versus S4
, and I'm not so sure anymore. I found R-bloggers blog post about efficiency issues in S3 vs S4, and as that was 5 years ago, I tested the same thing now:
library(microbenchmark)
setClass("MyClass", representation(x="numeric"))
microbenchmark(structure(list(x=rep(1, 10^7)), class="MyS3Class"),
new("MyClass", x=rep(1, 10^7)) )
Unit: milliseconds
expr
structure(list(x = rep(1, 10^7)), class = "MyS3Class")
new("MyClass", x = rep(1, 10^7))
min lq median uq max neval
148.75049 152.3811 155.2263 159.8090 323.5678 100
75.15198 123.4804 129.6588 131.5031 241.8913 100
因此在这个简单的示例中,S4
实际上要快一些.然后,我阅读了 SO问题关于使用S3
vs S4
,这在很大程度上支持S3
.特别是@ joshua-ulrich的回答使我对S4
表示怀疑,因为它说
So in this simple example, S4
was actually bit faster. Then I read SO question about using S3
vs S4
, which was quite much in favor of S3
. Especially @joshua-ulrich 's answer made me doubt against S4
, as it said that
如果考虑在优化模型的对数似然性时每次迭代都要更新对象的情况,那将是一个大问题.经过一番谷歌搜索后,我发现约翰·钱伯斯(John Chambers)发布关于此问题,该问题似乎在R 3.0.0中有所更改.
That feels like a big issue if I consider my case where I'm updating my object in every iteration when optimizing log-likelihood of my model. After some googling I found John Chambers post about this issue, which seems to be changing in R 3.0.0.
因此,尽管我认为在代码中使用S4
类会有所帮助(例如,从主模型类继承的更多类)以及进行有效性检查等,这还是有益的,但我现在想知道是否值得所有工作都在表现方面?因此,就性能而言,S3
和S4
之间是否存在真正的性能差异?我还应该考虑其他一些性能问题吗?或者甚至有可能就这个问题说些什么?
So although I feel it would be beneficial to use S4
classes for some clarity in my codes (for example more classes inheriting from the main model class), and for the validity checks etc, I am now wondering is it worth all the work in terms of performance? So, performance wise, is there real performance differences between S3
and S4
? Is there some other performance issues I should be considering? Or is it even possible to say something about this issue in general?
如@DWin和@ g-grothendieck所建议,上述基准测试未考虑更改现有对象的插槽的情况.因此,这是另一个与实际应用更为相关的基准(示例中的函数可以是模型中某些元素的获取/设置函数,在最大化对数似然性时会对其进行更改):
As @DWin and @g-grothendieck suggested, the above benchmarking doesn't consider the case where the slot of an existing object is altered. So here's another benchmark which is more relevant to the true application (the functions in the example could be get/set functions for some elements in the model, which are altered when maximizing the log-likelihood):
objS3<-structure(list(x=rep(1, 10^3), z=matrix(0,10,10), y=matrix(0,10,10)),
class="MyS3Class")
fnS3<-function(obj,a){
obj$y<-a
obj
}
setClass("MyClass", representation(x="numeric",z="matrix",y="matrix"))
objS4<-new("MyClass", x=rep(1, 10^3),z=matrix(0,10,10),y=matrix(0,10,10))
fnS4<-function(obj,a){
obj@y<-a
obj
}
a<-matrix(1:100,10,10)
microbenchmark(fnS3(objS3,a),fnS4(objS4,a))
Unit: microseconds
expr min lq median uq max neval
fnS3(objS3, a) 6.531 7.464 7.932 9.331 26.591 100
fnS4(objS4, a) 21.459 22.393 23.325 23.792 73.708 100
基准测试是在64位Windows 7上的R 2.15.2上执行的.因此,这里S4
显然要慢一些.
The benchmarks are performed on R 2.15.2, on 64bit Windows 7. So here S4
is clearly slower.
推荐答案
-
首先,您可以轻松地将S3方法用于S4类:
First of all, you can easily have S3 methods for S4 classes:
> extract <- function (x, ...) x@x > setGeneric ("extr4", def=function (x, ...){}) [1] "extr4" > setMethod ("extr4", signature= "MyClass", definition=extract) [1] "extr4" > `[.MyClass` <- extract > `[.MyS3Class` <- function (x, ...) x$x > microbenchmark (objS3[], objS4 [], extr4 (objS4), extract (objS4)) Unit: nanoseconds expr min lq median uq max neval objS3[] 6775 7264.5 7578.5 8312.0 39531 100 objS4[] 5797 6705.5 7124.0 7404.0 13550 100 extr4(objS4) 20534 21512.0 22106.0 22664.5 54268 100 extract(objS4) 908 1188.0 1328.0 1467.0 11804 100
由于Hadley的评论,请将实验更改为
plot
:edit: due to Hadley's comment, change the experiment to
plot
:> `plot.MyClass` <- extract > `plot.MyS3Class` <- function (x, ...) x$x > microbenchmark (plot (objS3), plot (objS4), extr4 (objS4), extract (objS4)) Unit: nanoseconds expr min lq median uq max neval plot(objS3) 28915 30172.0 30591 30975.5 1887824 100 plot(objS4) 25353 26121.0 26471 26960.0 411508 100 extr4(objS4) 20395 21372.5 22001 22385.5 31359 100 extract(objS4) 979 1328.0 1398 1677.0 3982 100
对于
plot
的S4方法,我得到:for an S4 method for
plot
I get:plot(objS4) 19835 20428.5 21336.5 22175.0 58876 100
所以是的,
[
具有非常快的调度机制(这很好,因为我认为提取和相应的替换功能是最常被调用的方法之一.但是,不,S4调度并不比S3调度慢.So yes,
[
has an exceptionally fast dispatch mechanism (which is good, because I think extraction and the corresponding replacement functions are among the most frequently called methods. But no, S4 dispatch isn't slower than S3 dispatch.此处,S4对象上的S3方法与S3对象上的S3方法一样快.但是,没有调度的呼叫仍然更快.
Here the S3 method on the S4 object is as fast as the S3 method on the S3 object. However, calling without dispatch is still faster.
-
有些东西在S3上表现更好,例如
as.matrix
或as.data.frame
由于某些原因,将它们定义为S3意味着例如lm (formula, objS4)
将立即可用.这不适用于将as.data.frame
定义为S4方法的情况.
there are some things that work much better as S3 such as
as.matrix
oras.data.frame
For some reason, defining these as S3 means that e.g.lm (formula, objS4)
will work out of the box. This doesn't work withas.data.frame
being defined as S4 method.
此外,在S3方法上调用
debug
更为方便.Also it is much more convenient to call
debug
on a S3 method.某些其他问题不适用于S3,例如分派第二个参数.
some other things will not work with S3, e.g. dispatching on the second argument.
性能是否会显着下降显然取决于您的类,即,您拥有哪种类型的结构,对象的大小以及调用方法的频率.几微秒的方法分配与ms甚至s的计算无关紧要.但是,当函数被调用数十亿次时,μs确实很重要.
Whether there will be any noticable drop in performance obviously depends on your class, that is, what kind of structures you have, how large the objects are and how often methods are called. A few μs of method dispatch won't matter with a calculation of ms or even s. But μs do matter when a function is called billions of times.
导致某些经常调用的功能(
[
)的性能显着下降的一件事是S4验证(在validObject
中完成了大量检查)-但是,我很高兴拥有它,所以我用它.在内部,我使用主力函数来跳过此步骤.One thing that caused noticable performance drop for some functions that are called often (
[
) is S4 validation (a fair number of checks done invalidObject
) - however, I'm glad to have it, so I use it.Internally I use workhorse functions that skip this step.如果您有大量数据,并且按引用调用将有助于提高性能,则可能需要查看引用类.到目前为止,我从未真正与他们合作过,所以我无法对此发表评论.
In case you have large data and call-by-reference would help your performance, you may want to have a look at reference classes. I've never really worked with them so far, so I cannot comment on this.
这篇关于使用S3将包转换为S4类,性能会下降吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!