合并列表中的数据框

合并列表中的数据框

本文介绍了合并列表中的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是早期 post 围绕简化我的函数和消除合并由 lapply 产生的数据帧的需要进行了讨论.虽然诸如dplyrdata.table 等工具消除了合并的需要,但我仍然想知道在这种情况下如何合并.我已经简化了基于这个 answer 到我上一个问题的生成列表的函数.

This is an offshoot of an earlier post that built a discussion around simplifying my function and eliminating the need for merging data frames that result from an lapply. Although tools such as dplyr and data.table eliminate the need for the merging, I'd still like to know how to merge in this situation. I have simplified the function that produces the list based on this answer to my previous question.

#Reproducible data
Data <- data.frame("custID" = c(1:10, 1:20),
    "v1" = rep(c("A", "B"), c(10,20)),
    "v2" = c(30:21, 20:19, 1:3, 20:6), stringsAsFactors = TRUE)

#Split-Apply function
res <- lapply(split(Data, Data$v1), function(df) {
    cutoff <- quantile(df$v2, c(0.8, 0.9))
    top_pct <- ifelse(df$v2 > cutoff[2], 10, ifelse(df$v2 > cutoff[1], 20, NA))
    na.omit(data.frame(custID = df$custID, top_pct))
    })

这给了我以下结果:

$A
  custID top_pct
1      1      10
2      2      20

$B
  custID top_pct
1      1      10
2      2      20
6      6      10
7      7      20

我希望结果如下所示:

  custID A_top_pct B_top_pct
1      1        10        10
2      2        20        20
3      6        NA        10
4      7        NA        20

到达那里的最佳方式是什么?我应该做某种重塑吗?如果我这样做,我是否必须先合并数据框?

What's the best way to get there? Should I be doing some sort of reshaping? If I do that, do I have to merge the data frames first?

这是我的解决方案,可能不是最好的.(在实际应用中,列表中会有两个以上的数据框.)

Here's my solution, which may not be the best. (In the real application, there would be more than two data frames in the list.)

#Change the new variable name
names1 <- names(res)

for(i in 1:length(res)) {
    names(res[[i]])[2] <- paste0(names1[i], "_top_pct")
}

#Merge the results
res_m <- res[[1]]
for(i in 2:length(res)) {
    res_m <- merge(res_m, res[[i]], by = "custID", all = TRUE)
}

推荐答案

你可以试试 Reducemerge

 Reduce(function(...) merge(..., by='custID', all=TRUE), res)
 #     custID top_pct.x top_pct.y
 #1      1        10        10
 #2      2        20        20
 #3      6        NA        10
 #4      7        NA        20

或者正如@Colonel Beauvel 建议的那样,一种更具可读性的方法是用 library(functional)

Or as @Colonel Beauvel suggested, a more readable approach would be wrapping it with Curry from library(functional)

 library(functional)
 Reduce(Curry(merge, by='custID', all=T), res)

这篇关于合并列表中的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 14:06