本文介绍了R data.frame从变量中获取值,该变量由另一个变量选择,向量化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 我有一些数据带有许多类似的变量,其中有一个额外的变量,它指出了我真正想要的那些类似变量的哪一个。使用循环我可以查找正确的值,但是数据很大,循环很慢,看起来应该是可以向量化的。我只是没有想出如何。 编辑:所选的变量将被用作同一数据框架中的新变量,所以顺序很重要。下面给出的示例中还没有显示许多其他变量。 示例数据集: set.seed(0) df< - data.frame(yr1 = sample(1000:1100,8)) yr2 = sample(2000:2100,8) yr3 = sample(3000:3100,8), yr4 = sample(4000:4100,8), var = paste0(yr,sample(1:4,8, replace = TRUE))#df ##yr1 yr2 yr3 yr4 var #1 1090 2066 3050 4012 yr3 #2 1026 2062 3071 4026 yr2 #3 1036 2006 3098 4038 yr1 #4 1056 2020 3037 4001 yr​​4 #5 1088 2017 3075 4037 yr3 #6 1019 2065 3089 4083 yr4 #7 1085 2036 3020 4032 yr1 #8 1096 2072 3061 4045 yr3 ,但缓慢而尴尬: ycode for(i in 1:nrow(df)){ ycode [i]< - df [i,df $ var [i]] } df $ ycode< - ycode #d f #yr1 yr2 yr3 yr4 var ycode #1 1090 2066 3050 4012 yr3 3050 #2 1026 2062 3071 4026 yr2 2062 #3 1036 2006 3098 4038 yr1 1036 #4 1056 2020 3037 4001 yr​​4 4001 #5 1088 2017 3075 4037 yr3 3075 #6 1019 2065 3089 4083 yr4 4083 #7 1085 2036 3020 4032 yr1 1085 #8 1096 2072 3061 4045 yr3 3061 似乎我应该能够向量化这个,像这样: df $ ycode / pre> 但是我发现结果令人惊讶: #yr1 yr2 yr3 yr4 var ycode.yr3 ycode.yr2 ycode.yr1 ycode.yr4 ycode.yr3.1 ycode.yr4.1 ycode.yr1.1 ycode.yr3.2 #1 1090 2066 3050 4012 yr3 3050 2066 1090 4012 3050 4012 1090 3050 #2 1026 2062 3071 4026 yr2 3071 2062 1026 4026 3071 4026 1026 3071 #3 1036 2006 3098 4038 yr1 3098 2006 1036 4038 3098 4038 1036 3098 #4 1056 2020 3037 4001 yr​​4 3037 2020 1056 4001 3037 4001 1056 3037 #5 1088 2017 3075 4037 yr3 3075 2017 1088 4037 3075 4037 1088 3075 # 6 1019 2065 3089 4083 yr4 3089 2065 1019 4083 3089 4083 1019 3089 #7 1085 2036 3020 4032 yr1 3020 2036 1085 4032 3020 4032 1085 3020 #8 1096 2072 3061 4045 yr3 3061 2072 1096 4045 3061 4045 1096 3061 我还尝试了许多*应用的变体,但没有一个甚至接近。一些尝试: >应用(df,1,function(x)x [x $ var]) x $ var中的错误:$ operator对原子向量无效>应用(df,1,function(x)x [x [var]]) x [var]中的错误:无效的下标类型'closure' 任何想法?非常感谢.. 解决方案我们可以使用行/列索引。它应该比循环快。 df [-ncol(df)] [cbind(1:nrow(df) ,match(df $ var,head(names(df), - 1)))] #[1] 3050 2062 1036 4001 3075 4083 1085 3061 只是为了一些多样性,一个 data.table 解决方案be(应该比上面的索引缓慢)。将data.frame转换为data.table( setDT(df)),按行的顺序分组,我们 get 转换为字符后的var值。 library(data.table) setDT(df)[,ycode:= get(as.character(var)),1:nrow(df)] df # yr1 yr2 yr3 yr4 var ycode #1:1090 2066 3050 4012 yr3 3050 #2:1026 2062 3071 4026 yr2 2062 #3:1036 2006 3098 4038 yr1 1036 # 4:1056 2020 3037 4001 yr​​4 4001 #5:1088 2017 3075 4037 yr3 3075 #6:1019 2065 3089 4083 yr4 4083 #7:1085 2036 3020 4032 yr1 1085 #8:1096 2072 3061 4045 yr3 3061 I have data that comes to me with many similar variables, with an additional variable which indicates which one of those similar variables I really want. Using a loop I can look up the correct value, but the data is large, the loop is slow, and it seems like this should be vectorizable. I just haven't figured out how.EDIT: The selected variable will be used as a new variable in the same data frame, so order matters. There are many other variables not shown in the example given below.Example data set:set.seed(0)df <- data.frame(yr1 = sample(1000:1100, 8), yr2 = sample(2000:2100, 8), yr3 = sample(3000:3100, 8), yr4 = sample(4000:4100, 8), var = paste0("yr", sample(1:4, 8, replace = TRUE)))# df# # yr1 yr2 yr3 yr4 var# 1 1090 2066 3050 4012 yr3# 2 1026 2062 3071 4026 yr2# 3 1036 2006 3098 4038 yr1# 4 1056 2020 3037 4001 yr4# 5 1088 2017 3075 4037 yr3# 6 1019 2065 3089 4083 yr4# 7 1085 2036 3020 4032 yr1# 8 1096 2072 3061 4045 yr3This loop method does the trick, but is slow and awkward:ycode <- character(nrow(df))for(i in 1:nrow(df)) { ycode[i] <- df[i, df$var[i]]}df$ycode <- ycode# df# yr1 yr2 yr3 yr4 var ycode# 1 1090 2066 3050 4012 yr3 3050# 2 1026 2062 3071 4026 yr2 2062# 3 1036 2006 3098 4038 yr1 1036# 4 1056 2020 3037 4001 yr4 4001# 5 1088 2017 3075 4037 yr3 3075# 6 1019 2065 3089 4083 yr4 4083# 7 1085 2036 3020 4032 yr1 1085# 8 1096 2072 3061 4045 yr3 3061 It seems like I should be able to vectorize this, like so:df$ycode <- df[, df$var]But I find the result surprising:# yr1 yr2 yr3 yr4 var ycode.yr3 ycode.yr2 ycode.yr1 ycode.yr4 ycode.yr3.1 ycode.yr4.1 ycode.yr1.1 ycode.yr3.2# 1 1090 2066 3050 4012 yr3 3050 2066 1090 4012 3050 4012 1090 3050# 2 1026 2062 3071 4026 yr2 3071 2062 1026 4026 3071 4026 1026 3071# 3 1036 2006 3098 4038 yr1 3098 2006 1036 4038 3098 4038 1036 3098# 4 1056 2020 3037 4001 yr4 3037 2020 1056 4001 3037 4001 1056 3037# 5 1088 2017 3075 4037 yr3 3075 2017 1088 4037 3075 4037 1088 3075# 6 1019 2065 3089 4083 yr4 3089 2065 1019 4083 3089 4083 1019 3089# 7 1085 2036 3020 4032 yr1 3020 2036 1085 4032 3020 4032 1085 3020# 8 1096 2072 3061 4045 yr3 3061 2072 1096 4045 3061 4045 1096 3061I also tried numerous variations on *apply, but none of those even came close. Some attempts:> apply(df, 1, function(x) x[x$var])Error in x$var : $ operator is invalid for atomic vectors> apply(df, 1, function(x) x[x[var]])Error in x[var] : invalid subscript type 'closure'Any ideas? Many thanks.. 解决方案 We can use the row/column indexing. It should be fast compared to the loop. df[-ncol(df)][cbind(1:nrow(df),match(df$var,head(names(df),-1)))] #[1] 3050 2062 1036 4001 3075 4083 1085 3061Just for some diversity, a data.table solution would be (should be slow compared to the indexing above). Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by the sequence of rows, we get the value of 'var' after converting to character class.library(data.table)setDT(df)[, ycode := get(as.character(var)) , 1:nrow(df)]df# yr1 yr2 yr3 yr4 var ycode#1: 1090 2066 3050 4012 yr3 3050#2: 1026 2062 3071 4026 yr2 2062#3: 1036 2006 3098 4038 yr1 1036#4: 1056 2020 3037 4001 yr4 4001#5: 1088 2017 3075 4037 yr3 3075#6: 1019 2065 3089 4083 yr4 4083#7: 1085 2036 3020 4032 yr1 1085#8: 1096 2072 3061 4045 yr3 3061 这篇关于R data.frame从变量中获取值,该变量由另一个变量选择,向量化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
09-17 18:07