本文介绍了基于名称类型将data.frame的列合计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 假设我有以下data.frame,它将R包的名称与它所属的CRAN任务视图相关联: dictionary< - data.frame(task.view = c(rep(High.Performance.Computing,3),rep(Machine.Learning,3)),package = c(Rcpp HadoopStreaming,rJava,e1071,nnet,RWeka)) #task.view package #High.Performance.Computing Rcpp # High.Performance.Computing HadoopStreaming #High.Performance.Computing rJava #Machine.Learning e1071 #Machine.Learning nnet #Machine.Learning RWeka 然后我计算每个包从一个学生写的四个工具中调用的次数: package.referals #Rcpp HadoopStreaming rJava e1071 nnet RWeka #student pkg 1 1 1 1 1 1 1 #student pkg 2 0 0 0 1 0 0 #student pkg 3 1 0 0 1 0 0 #student pkg 4 1 0 1 1 0 1 如何根据我的data.frame的包任务视图关系重组我的package.referals data.frame的列? 例如我想输出为 data.frame(High.Performance.Computing = c(3,0,1,2 ),Machine.Learning = c(3,1,1,2),row.names = paste(student pkg,1:4)) #High.Performance.Computing Machine.Learning #student pkg 1 3 3 #student pkg 2 0 1 #student pkg 3 1 1 #student pkg 4 2 2 我尝试了下面的例子,但是当我试图将它重组成我想要的输出(求和和转置)时遇到困难: require(data.table) #package.referals的列名data.frame package.referals。 colnames< - names(package.referals) #我的任务视图和包关系的数据表,由包名称键入 dictionary.dt< - data.table ,key =package) #我的package.referals data.frame的数据表,转置并由包名 package.referals.dt< - data键入。表格(package = package.referals.colnames,t(package.referals),key =package) #加入data.tables,使包名和相应的任务视图在同一行 dt< - package.referals.dt [J(dictionary.dt)] setkey(dt,task.view) #package student pkg 1 student pkg 2 student pkg 3 student pkg 4 task.view #1:HadoopStreaming 1 0 0 0 High.Performance.Computing #2:Rcpp 1 0 1 1 High.Performance.Computing #3 :rJava 1 0 0 1 High.Performance.Computing #4:e1071 1 1 1 1 Machine.Learning #5:nnet 1 0 0 0 Machine.Learning #6:RWeka 1 0 0 1 Machine.Learning 解决方案 reshape 和base R: package.referals $ id& rownames(package.referals) pkgr< - melt(package.referals,variable.name =package) pkgr< - pkgr [pkgr $ value> 0,] df < - merge(pkgr,dictionary,all.x = TRUE) table(df $ id,df $ task.view) 如果你真的想使用 data.table 而不是 merge 可以用以下代替最后三行: pkgr< - data.table(pkgr,key =package) dictionary< - data.table(dictionary,key =package) df< - pkgr [dictionary] 表(df $ id,df $ task.view) Let's say I have the following data.frame which relates the name of an R package to the CRAN Task View it belongs to:dictionary <- data.frame(task.view = c(rep("High.Performance.Computing", 3), rep("Machine.Learning", 3)), package = c("Rcpp", "HadoopStreaming", "rJava", "e1071", "nnet", "RWeka"))# task.view package# High.Performance.Computing Rcpp# High.Performance.Computing HadoopStreaming# High.Performance.Computing rJava# Machine.Learning e1071# Machine.Learning nnet# Machine.Learning RWekaI then count the number of times each package is called from one of four tools written by a student:package.referals <- data.frame(Rcpp = c(1, 0, 1, 1), HadoopStreaming = c(1, 0, 0, 0), rJava = c(1, 0, 0, 1), e1071 = c(1, 1, 1, 1), nnet = c(1, 0, 0, 0), RWeka = c(1, 0, 0, 1), row.names = paste("student pkg", 1:4))# Rcpp HadoopStreaming rJava e1071 nnet RWeka# student pkg 1 1 1 1 1 1 1# student pkg 2 0 0 0 1 0 0# student pkg 3 1 0 0 1 0 0# student pkg 4 1 0 1 1 0 1How can I restructure the columns of my package.referals data.frame above based on my data.frame of package task view relations? E.g. I would like the output to be data.frame(High.Performance.Computing = c(3, 0, 1, 2), Machine.Learning = c(3, 1, 1, 2), row.names = paste("student pkg", 1:4))# High.Performance.Computing Machine.Learning# student pkg 1 3 3# student pkg 2 0 1# student pkg 3 1 1# student pkg 4 2 2I tried the following but I got stuck when trying to restructure it into the output I would like (summing and transposing):require(data.table)# column names of package.referals data.framepackage.referals.colnames <- names(package.referals)# a data.table of my task view and package relations, keyed by package namedictionary.dt <- data.table(dictionary, key = "package")# a data.table of my package.referals data.frame, transposed, and keyed by package namepackage.referals.dt <- data.table(package = package.referals.colnames, t(package.referals), key="package")# Joining data.tables so that the package name and corresponding task view are on the same linedt <- package.referals.dt[J(dictionary.dt)]setkey(dt, "task.view")# package student pkg 1 student pkg 2 student pkg 3 student pkg 4 task.view# 1: HadoopStreaming 1 0 0 0 High.Performance.Computing# 2: Rcpp 1 0 1 1 High.Performance.Computing# 3: rJava 1 0 0 1 High.Performance.Computing# 4: e1071 1 1 1 1 Machine.Learning# 5: nnet 1 0 0 0 Machine.Learning# 6: RWeka 1 0 0 1 Machine.Learning 解决方案 Here is a solution with reshape and base R :package.referals$id <- rownames(package.referals)pkgr <- melt(package.referals, variable.name="package")pkgr <- pkgr[pkgr$value>0,]df <- merge(pkgr, dictionary, all.x=TRUE)table(df$id, df$task.view)If you really want to use data.table instead of merge, you can replace the last third lines with :pkgr <- data.table(pkgr, key="package")dictionary <- data.table(dictionary, key="package")df <- pkgr[dictionary]table(df$id, df$task.view) 这篇关于基于名称类型将data.frame的列合计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 10-30 05:00