问题描述
受此答案的启发,我的目标是在m
个数据帧中找到特定于变量的变量.只能是一个m
(例如m[[15]]
),而不能是其他m
.
Inspired by this answer, my goal is to find variables in m
clusters of data.frames that are specific to only one m
(e.g., m[[15]]
) but not other m
s.
例如,我知道变量genre == 4
仅特定于m[[15]]
("Fazio",即names(m)[15]
),并且genre == 4
不会出现在任何其他m
集群中(通过subset(d, genre == 4)).
For example, I know variable genre == 4
is only specific to m[[15]]
("Fazio" i.e.,names(m)[15]
), and genre == 4
doesn't occur in any other of m
clusters (confirm by subset(d, genre == 4)
).
因此,我希望输出的名称为"Fazio"
和genre == 4
.
Thus, I expect my output to give me the name "Fazio"
and genre == 4
.
我想对mods
中显示的所有变量重复此过程,而不仅仅是genre
?
I want to repeat this process for all variables shown in mods
not just genre
?
我尝试了以下操作,但均未成功:
d <- read.csv("https://raw.githubusercontent.com/rnorouzian/m/master/v.csv", h = T) # DATA
mods <- c("genre","cont.type","time","cf.timely","ssci","setting","ed.level", # mods
"Age","profic","motivation","Ss.aware","random.grp","equiv.grp",
"rel.inter","rel.intra","sourced","timed","Location",
"cf.scope","cf.type","error.key","cf.provider","cf.revision","cf.oral",
"Length","instruction","graded","acc.measure","cf.training","error.type")
m <- split(d, d$study.name) # `m` clusters of data.frames
# SOLUTION TRIED:
tmp = do.call(rbind, lapply(mods, function(x){
d = unique(d[c("study.name", x)])
names(d) = c("study.name", "val")
transform(d, nm = x)
}))
# this logic may need to change:
tmp = tmp[ave(as.numeric(as.factor(tmp$val)), tmp$val, FUN = length) == 1,]
lapply(split(tmp, tmp$study.name), function(a){
setNames(a$val, a$nm)
}) # doesn't return anything
推荐答案
我们也可以通过在ave
tmp1 <- tmp[with(tmp, ave(val, val, nm, FUN = length)==1),]
现在执行split
tmp2 <- lapply(split(tmp1, tmp1$study.name, drop = TRUE), `row.names<-`, NULL)
rm.df <- data.frame(study.name = c(rep("Bitc_Knch_c", 3),
rep("Sun", 3)), code = c(88,88,88,7,4,0),
mod.name = c("error.type","cf.scope","cf.type","error.type",
"cf.type","error.key"))
rm.these <- split(rm.df, rm.df$study.name)
tmp2[names(rm.these)] <- Map(function(x, y) {
subset(x, !(nm %in% y$mod.name & val %in% y$code))},
tmp2[names(rm.these)], rm.these)
Filter(nrow, tmp2)
这篇关于在R中的data.frames列表中子集特定于群集的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!