本文介绍了Crosstabs与R中的data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我喜欢R中的data.table包,我认为它可以帮助我执行复杂的交叉制表任务,但没有想出如何使用包执行类似 table 。 以下是一些复制调查数据: ID party< - c(GOP,GOP,Democraticat,GOP) df< - data.frame $ b 在表中,计算参与者的意见数是简单的 table $ opinion,df $ party)。 我已经设法在data.table中做类似的事情,但结果是笨重添加一个单独的列。 dt< - data.table(df) dt [,.N,by =party] 在data.table中有一些分组操作,和复杂的调查数据交叉表,但我还没有找到任何教程如何做到。非常感谢您的帮助。解决方案我们可以使用 dcast c $ c> data.table (请参阅 Efficient reshaping using data.tables vignette。 table / wiki / Getting-startedrel =nofollow> project wiki 或 CRAN项目页面)。 dcast .var ='ID',length) 基准 $ b b 如果我们使用稍大的数据集,并使用 dcast 从 reshape2 和 data.table set.seed(24) df< data.frame(ID = 1:1e6,opinion = sample(letters,1e6,replace = TRUE), party = sample(1:9,1e6,replace = TRUE)) system.time $ d 系统时间(dcast(setDT(df),意见〜party,value.var ='ID',length))#用户系统已过#0.022 0.000 0.023 system.time(setDT(df) N,by =。(opinion,party)])#用户系统已过#0.018 0.001 0.018 第三个选项稍微好一点,但它是'long'格式。如果OP想要一个宽格式,可以使用 data.table dcast 。 注意:我使用的是devel版本即 v1.9.7 ,但CRAN应该足够快。 I love the data.table package in R, and I think it could help me perform sophisticated cross tabulation tasks, but haven't figured out how to use the package to do tasks similar to table.Here's some replication survey data:opinion <- c("gov", "market", "gov", "gov")ID <- c("resp1", "resp2", "resp3", "resp4")party <- c("GOP", "GOP", "democrat", "GOP")df <- data.frame(ID, opinion, party)In tables, counting the number of opinions by party is as simple as table(df$opinion, df$party).I've managed to do something similar in data.table, but the result is clunky and it adds a separate column.dt <- data.table(df)dt[, .N, by="party"]There's a number of grouping operations in data.table that could be great for fast and sophisticated crosstabs of survey data, but i haven't found any tutorials on how to it. Thanks for any help. 解决方案 We can use dcast from data.table (See the Efficient reshaping using data.tables vignette on the project wiki or on the CRAN project page).dcast(dt, opinion~party, value.var='ID', length)BenchmarksIf we use a slightly bigger dataset and compare the speed using dcast from reshape2 and data.tableset.seed(24)df <- data.frame(ID=1:1e6, opinion=sample(letters, 1e6, replace=TRUE), party= sample(1:9, 1e6, replace=TRUE))system.time(dcast(df, opinion ~ party, value.var='ID', length))# user system elapsed# 0.278 0.013 0.293system.time(dcast(setDT(df), opinion ~ party, value.var='ID', length))# user system elapsed# 0.022 0.000 0.023system.time(setDT(df)[, .N, by = .(opinion, party)])# user system elapsed# 0.018 0.001 0.018The third option is slightly better but it is in 'long' format. If the OP wants to have a 'wide' format, the data.table dcast can be used.NOTE: I am using the the devel version i.e. v1.9.7, but the CRAN should be fast enough. 这篇关于Crosstabs与R中的data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-23 03:17