举个简单的例子。
我有数据帧data1

name<-c("John","John","Mike","Amy".....)
nationality<-c("Canada","America","Spain","Japan".....)
data1<-data.frame(name,nationality....)

也就是说人们来自不同的国家
每个人都有自己的名字和国家,没有重复。
第二个数据帧是
name2<-c("John","John","Mike","John",......)
nationality2<-c("Canada","Canada","Canada".....)
score<-c(87,67,98,78,56......)
data2<-data.frame(name2,nationality2,score)

每个人都被承诺在data2中有5行,这意味着他们有5个分数,但他们是随机的。
我想做的是知道每个人的5分,但我不在乎他的名字和来自哪里。
我想要的最后一个数据帧是
   score1   score2  score3  score4   score5
1    89        89       87     78        90
2    ...
3    ...

每排代表一个人5分,但我不在乎他是谁。
我的数据量太大,无法使用for函数。
我能做什么?

最佳答案

在我看来,这就是你要问的:

data1 <- data.frame(name  = c("John","Mike","Amy"),
                nationality = c("America","Canada","Canada"))

data2 <- data.frame(name2 = rep(c("John","Mike","Amy","Jack","John"),each = 5),
                    score = sample(100,25), nationality2 =rep(c("America","Canada","Canada","Canada","Canada"),each = 5))

data3 <- merge(data2,data1,by.x=c("name2","nationality2"),by.y=c("name","nationality"))
data3$name_country <- paste(data3$name2,data3$nationality2)
all_scores_list <- tapply(data3$score,data3$name_country,c)
as.data.frame(do.call(rbind,all_scores_list))

# V1 V2 V3 V4 V5
# Amy Canada   57 69 90 81 50
# John America  4 92 75 15  2
# Mike Canada  25 86 51 20 12

关于r - R当我想遍历数据框时如何避免“for”,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/44460666/

10-10 00:17
查看更多