举个简单的例子。
我有数据帧data1
。
name<-c("John","John","Mike","Amy".....)
nationality<-c("Canada","America","Spain","Japan".....)
data1<-data.frame(name,nationality....)
也就是说人们来自不同的国家
每个人都有自己的名字和国家,没有重复。
第二个数据帧是
name2<-c("John","John","Mike","John",......)
nationality2<-c("Canada","Canada","Canada".....)
score<-c(87,67,98,78,56......)
data2<-data.frame(name2,nationality2,score)
每个人都被承诺在
data2
中有5行,这意味着他们有5个分数,但他们是随机的。我想做的是知道每个人的5分,但我不在乎他的名字和来自哪里。
我想要的最后一个数据帧是
score1 score2 score3 score4 score5
1 89 89 87 78 90
2 ...
3 ...
每排代表一个人5分,但我不在乎他是谁。
我的数据量太大,无法使用
for
函数。我能做什么?
最佳答案
在我看来,这就是你要问的:
data1 <- data.frame(name = c("John","Mike","Amy"),
nationality = c("America","Canada","Canada"))
data2 <- data.frame(name2 = rep(c("John","Mike","Amy","Jack","John"),each = 5),
score = sample(100,25), nationality2 =rep(c("America","Canada","Canada","Canada","Canada"),each = 5))
data3 <- merge(data2,data1,by.x=c("name2","nationality2"),by.y=c("name","nationality"))
data3$name_country <- paste(data3$name2,data3$nationality2)
all_scores_list <- tapply(data3$score,data3$name_country,c)
as.data.frame(do.call(rbind,all_scores_list))
# V1 V2 V3 V4 V5
# Amy Canada 57 69 90 81 50
# John America 4 92 75 15 2
# Mike Canada 25 86 51 20 12
关于r - R当我想遍历数据框时如何避免“for”,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/44460666/