在R中的data.frames列表中子集特定于群集的值

本文介绍了在R中的data.frames列表中子集特定于群集的值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

受此答案的启发，我的目标是在m个数据帧中找到特定于变量的变量.只能是一个m(例如m[[15]])，而不能是其他m.

Inspired by this answer, my goal is to find variables in m clusters of data.frames that are specific to only one m (e.g., m[[15]]) but not other ms.

例如，我知道变量genre == 4仅特定于m[[15]]("Fazio"，即names(m)[15])，并且genre == 4不会出现在任何其他m集群中(通过subset(d, genre == 4)).

For example, I know variable genre == 4 is only specific to m[[15]] ("Fazio" i.e.,names(m)[15]), and genre == 4 doesn't occur in any other of m clusters (confirm by subset(d, genre == 4)).

因此，我希望输出的名称为"Fazio"和genre == 4.

Thus, I expect my output to give me the name "Fazio" and genre == 4 .

我想对mods中显示的所有变量重复此过程，而不仅仅是genre?

I want to repeat this process for all variables shown in mods not just genre?

我尝试了以下操作，但均未成功:

d <- read.csv("https://raw.githubusercontent.com/rnorouzian/m/master/v.csv", h = T) # DATA

mods <- c("genre","cont.type","time","cf.timely","ssci","setting","ed.level",  # mods
          "Age","profic","motivation","Ss.aware","random.grp","equiv.grp",
          "rel.inter","rel.intra","sourced","timed","Location",
          "cf.scope","cf.type","error.key","cf.provider","cf.revision","cf.oral",
          "Length","instruction","graded","acc.measure","cf.training","error.type")

m <- split(d, d$study.name) # `m` clusters of data.frames

# SOLUTION TRIED:

tmp = do.call(rbind, lapply(mods, function(x){
  d = unique(d[c("study.name", x)])
  names(d) = c("study.name", "val")
  transform(d, nm = x)
}))

# this logic may need to change:
tmp = tmp[ave(as.numeric(as.factor(tmp$val)), tmp$val, FUN = length) == 1,] 

lapply(split(tmp, tmp$study.name), function(a){
 setNames(a$val, a$nm)
})                               # doesn't return anything

推荐答案

我们也可以通过在ave

tmp1 <- tmp[with(tmp, ave(val, val, nm, FUN = length)==1),]

现在执行split

tmp2 <- lapply(split(tmp1, tmp1$study.name, drop = TRUE), `row.names<-`, NULL)
rm.df <- data.frame(study.name = c(rep("Bitc_Knch_c", 3),
  rep("Sun", 3)), code = c(88,88,88,7,4,0), 

 mod.name = c("error.type","cf.scope","cf.type","error.type",
     "cf.type","error.key"))
rm.these <- split(rm.df, rm.df$study.name)

tmp2[names(rm.these)] <- Map(function(x, y) {
     subset(x, !(nm %in% y$mod.name & val %in% y$code))}, 
     tmp2[names(rm.these)], rm.these)
Filter(nrow, tmp2)

这篇关于在R中的data.frames列表中子集特定于群集的值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！