本文介绍了如何使用left_join和嵌套在R中计算不同类别的平均值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 我正在使用 left_join 和 nest 来计算收货数据的平均值。 bin.size = 100 第一个数据框: df = data.frame(x = c(300,400),y = c (sca1,sca2))xy 1 300 sca1 2 400 sca2 第二个数据框: df2 = data.frame(snp = c(1,2,10,100, (sca2,sca2,sca2,sca2,sca2,sca2,sca2 sca2)) snp r2 sca 1 1 0.70 sca1 2 2 0.80 sca1 3 10 0.70 sca1 4 100 0.10 sca1 5 1 0.90 sca2 6 2 0.98 sca2 7 14 0.80 sca2 8 16 0.80 sca2 9 399 0.01 sca2 pre> 来自@ r2evans的代码: output_bin_LD = df%> ;% left_join(nest(df2,snp,.key =snp),by = c(y=sca))%>% mutate b $ b cut = map(x,〜seq(0,。,by = bin.size)), tbls = pmap( .l = list(snp,cutting), .f = function(xx,break){z data_frame(cut = names(z),count = z)} ))%>% select(y,tbls)%>% unnest() 这个代码正在这样做: y cut count 1 sca1(0,100)4 2 sca1(100,200)0 3 sca1(200,300)0 4 sca2(0,100 ] 4 5 sca2(100,200] 0 6 sca2(200,300)0 7 sca2(300,400)1 最终目标是拥有 y cut count mean 1 sca1(0,100] 4 0.575 2 sca1(100,20 0] 0 0 3 sca1(200,300)0 0 4 sca2(0,100)4 0.87 5 sca2(100,200)0 0 6 sca2(200,300)0 0 7 sca2(300,400)1 399 到目前为止,我已经尝试过: df%>% left_join(nest(df2,snp,r2,.key =snp), by = c(y=sca))%>% mutate( cutting = map(x,〜seq(0,...,by = 100)), tbls = pmap( .l = list(snp,cutting), .f = function(xx,break){z a data_frame(cut = names(z),count = z,mean = a)}#.f )#关闭pmap )%>%#mutate select(y,tbls)%>% unnest() / pre> 但它输出我 NA s和一条警告消息: y cut count mean 1 sca1(0,100)4 NA 2 sca1 (100,200] 0 NA 3 sca1(200,300)0 NA 4 sca2(0,100)4 NA 5 sca2(100,200)0 NA 6 sca2(200,300)0 NA 7 sca2(300,400)1 NA 警告消息: 1:在mean.default(cut(xx $ r2,休息)):参数不是数字或逻辑:返回NA 2:在mean.default(cut(xx $ r2,休息)):参数不是数字或逻辑:返回NA 我该如何解决这个问题?我需要双重嵌套桌子吗? 解决方案不确定您的方法,但这里有一个简单的方法..使用 data.table 包,如果你有兴趣。您将需要最新版本(目前为1.10.0),因为这是一个新功能。 require( data.table)## v1.9.8 + 和< - b [a,on =。(sca = y,snp> start,snp< = end),## 1 。 = .N,mean = mean(r2,na.rm = TRUE)),## 2 by = .EACHI] ## 3 对于 a 中的每一行,请在参数 的条件下匹配c> b > 长度(匹配行索引) == .N 给出计数和 mean()给出了这些匹配索引的 r2 的平均值。 (2)中的表达式运行在 a 中的每一行。 其中, a 是: require(data.table)## v1.9.8 + a end = seq(bin.size,x,by = bin.size)), by = y] b< - fread(snp r2 sca 1 0.70 sca1 2 0.80 sca1 10 0.70 sca1 100 0.10 sca1 1 0.90 sca2 2 0.98 sca2 14 0.80 sca2 16 0.80 sca2 399 0.01 sca2) I'm trying to compute the mean values for binned data using left_join and nest.bin.size = 100 First dataframe:df = data.frame(x =c(300,400), y = c("sca1","sca2")) x y1 300 sca12 400 sca2Second dataframe:df2 = data.frame(snp = c(1,2,10,100,1,2,14,16,399), sca = c("sca1","sca1","sca1","sca1","sca2","sca2","sca2","sca2","sca2")) snp r2 sca1 1 0.70 sca12 2 0.80 sca13 10 0.70 sca14 100 0.10 sca15 1 0.90 sca26 2 0.98 sca27 14 0.80 sca28 16 0.80 sca29 399 0.01 sca2Code from @r2evans:output_bin_LD = df %>% left_join(nest(df2, snp, .key = "snp"), by = c("y" = "sca")) %>% mutate( cuts = map(x, ~ seq(0, ., by = bin.size)), tbls = pmap( .l = list(snp, cuts), .f = function(xx, breaks) { z <- table(cut(xx$snp, breaks)) data_frame(cut = names(z), count = z) } ) ) %>% select(y, tbls) %>% unnest()This code up is doing this: y cut count1 sca1 (0,100] 42 sca1 (100,200] 03 sca1 (200,300] 04 sca2 (0,100] 45 sca2 (100,200] 06 sca2 (200,300] 07 sca2 (300,400] 1The end goal would be to have y cut count mean1 sca1 (0,100] 4 0.5752 sca1 (100,200] 0 03 sca1 (200,300] 0 04 sca2 (0,100] 4 0.875 sca2 (100,200] 0 06 sca2 (200,300] 0 07 sca2 (300,400] 1 399So far I've tried this: df %>% left_join(nest(df2, snp, r2, .key = "snp"), by = c("y" = "sca")) %>% mutate( cuts = map(x, ~ seq(0, ., by = 100)), tbls = pmap( .l = list(snp, cuts), .f = function(xx, breaks) { z <- table(cut(xx$snp, breaks)) a <- mean(cut(xx$r2, breaks)) data_frame(cut = names(z), count = z, mean = a) } # .f ) # closing pmap ) %>% # mutate select(y, tbls) %>% unnest()But it outputs me NAs and a warning message: y cut count mean1 sca1 (0,100] 4 NA2 sca1 (100,200] 0 NA3 sca1 (200,300] 0 NA4 sca2 (0,100] 4 NA5 sca2 (100,200] 0 NA6 sca2 (200,300] 0 NA7 sca2 (300,400] 1 NAWarning messages:1: In mean.default(cut(xx$r2, breaks)) : argument is not numeric or logical: returning NA2: In mean.default(cut(xx$r2, breaks)) : argument is not numeric or logical: returning NAHow should I fix this? Do I need to double nest the table? 解决方案 Not sure about your approach, but here's a slightly straightforward approach.. using data.table package, if you're interested. You will need the latest version (currently 1.10.0) for this to work (since it's a new feature).require(data.table) ## v1.9.8+and <- b[a, on=.(sca=y, snp>start, snp<=end), ## 1 .(count=.N, mean=mean(r2, na.rm=TRUE)), ## 2 by=.EACHI] ## 3For each row in a, find matching row indices in b while matching on the condition provided to on argument.length(matching row indices) == .N gives count and mean() gives the mean of r2 for those matching indices.The expression in (2) is run for each row in a.where, a is:require(data.table) ## v1.9.8+a <- setDT(df)[, .(start=seq(0, x-1, by=bin.size), end=seq(bin.size, x, by=bin.size)), by=y]b <- fread("snp r2 sca 1 0.70 sca1 2 0.80 sca1 10 0.70 sca1 100 0.10 sca1 1 0.90 sca2 2 0.98 sca2 14 0.80 sca2 16 0.80 sca2 399 0.01 sca2") 这篇关于如何使用left_join和嵌套在R中计算不同类别的平均值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
09-22 07:45