本文介绍了在R中使用data.table / plyr的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我想要一个data.My数据A看起来像 author_id paper_id prob 731 24943 1 731 24943 1 731 688974 1 731 964345 .8 731 1201905 .9 731 1267992 1 736 249 .2 736 6889 1 736 94345 .7 736 1201905 .9 736 126992 .8 我想要的输出是: author_id paper_id 731 24943,24943,688974,1201905,964345 736 6889,1201945,126992,94345,249 这是paper_id是根据递减 如果我使用sql和R的组合,我认为解决方案是 语句 GROUP BY author_id ORDER BY prob 然后在R中使用粘贴,一旦为paper_id设置了顺序。 但是我需要R的总解决方案。 解决方案 c> temp 是您的数据集,然后执行 setDT(temp)[order(-prob),list(paper_id = paste0(paper_id,collapse =,))by = author_id] ## author_id paper_id ## 1:731 24943, 24943,688974,1267992,1201905,964345 ## 2:736 6889,1201905,126992,94345,249 编辑:8/11/2014 $ c> data.table v> = 1.9.4,你可以使用非常有效的 setorder 而不是 / code> str(temp) setorder(setDT(temp),-prob) list(paper_id = paste0(paper_id,collapse =,)),by = author_id] ## author_id paper_id ## 1:731 24943,24943,688974,1267992,1201905,964345 ## 2:736 6889,1201905,126992,94345,249 ,这整个事情也可以很容易地用基础R完成(虽然不推荐用于大数据集) aggregate(paper_id〜author_id ,temp [order(-temp $ prob),],paste,collapse =,)#author_id paper_id #1 731 24943,24943,688974,1267992,1201905,964345 #2 736 6889,1201905,126992,94345,249 I want a data.My data A looks likeauthor_id paper_id prob 731 24943 1 731 24943 1 731 688974 1 731 964345 .8 731 1201905 .9 731 1267992 1 736 249 .2 736 6889 1 736 94345 .7 736 1201905 .9 736 126992 .8The output I am desiring is:author_id paper_id 731 24943,24943,688974,1201905,964345 736 6889,1201945,126992,94345,249That is paper_id are arranged according to decreasing order of probability.If I use a combination of sql and R, i think the solution would bestatement<-"select * from A GROUP BY author_id ORDER BY prob"Then in R using paste once the order is set for paper_id.But i need the total solution in R.How could this be done?Thanks 解决方案 If temp is your data set then dolibrary(data.table)setDT(temp)[order(-prob), list(paper_id = paste0(paper_id, collapse=", ")), by = author_id]## author_id paper_id## 1: 731 24943, 24943, 688974, 1267992, 1201905, 964345## 2: 736 6889, 1201905, 126992, 94345, 249Edit: 8/11/2014Since data.table v >= 1.9.4, you can use the very efficient setorder instead of orderstr(temp)setorder(setDT(temp), -prob)[, list(paper_id = paste0(paper_id, collapse=", ")), by = author_id]## author_id paper_id## 1: 731 24943, 24943, 688974, 1267992, 1201905, 964345## 2: 736 6889, 1201905, 126992, 94345, 249And as a side note, this whole thing could be easily done with base R too (though not recommended for big data sets)aggregate(paper_id ~ author_id, temp[order(-temp$prob), ], paste, collapse = ", ")# author_id paper_id# 1 731 24943, 24943, 688974, 1267992, 1201905, 964345# 2 736 6889, 1201905, 126992, 94345, 249 这篇关于在R中使用data.table / plyr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-14 19:53