问题描述
我在大型数据集上使用了H2O,800万行和10列。我使用h2o.randomForest训练了我的randomForest。该模型训练得很好,并且预测工作正常。现在,我想将我的预测转换为data.frame。我这样做了:
I am using H2O, on a large dataset, 8 Million rows and 10 col. I trained my randomForest using h2o.randomForest. The model was trained fine and also prediction worked correctly. Now I would like to convert my predictions to a data.frame. I did this :
A2=h2o.predict(m1,Tr15_h2o)
pred2=as.data.frame(A2)
但是它太慢了,要花很多时间。有没有更快的方法来完成从H2o到data.frame或data.table的转换?
but it is too slow, takes forever. Is there any faster way to do the conversion from H2o to data.frame or data.table?
推荐答案
下面是一些代码演示了如何在后端使用data.table软件包以及我的Macbook上的一些基准测试:
Here is some code which demonstrates how to use the data.table package on the backend, along with some benchmarks on my macbook:
library(h2o)
h2o.init(nthreads = -1, max_mem_size = "16G")
hf <- h2o.createFrame(rows = 10000000)
options("h2o.use.data.table"=FALSE) #no data.table
system.time(df <- as.data.frame(hf))
# user system elapsed
# 224.387 13.274 272.252
options("datatable.verbose"=TRUE)
options("h2o.use.data.table"=TRUE) # use data.table
system.time(df2 <- as.data.frame(hf))
# user system elapsed
# 50.686 4.020 82.946
如果启用此选项,则在使用data.table时可以获得更多详细信息: options( datatable.verbose = TRUE)
。
You can get more detailed info when using data.table if you turn on this option: options("datatable.verbose"=TRUE)
.
这篇关于如何快速将我的H2O预测转换为data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!