本文介绍了应用两个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是 R,我有两个 data.frames,AB.它们都有 6 行,但 A 有 25000 列(基因),而 B 有 30 列.我想应用一个带有两个参数的函数 f(x,y) 其中 xAy 的每一列B 的每一列.目前看起来是这样的:

I'm using R, and I have two data.frames, A and B. They both have 6 rows, but A has 25000 columns (genes), and B has 30 columns. I'd like to apply a function with two arguments f(x,y) where x is every column of A and y is every column of B. So far it looks like this:

i = 1
for (x in A){
    j = 1
    for (y in B){
        out[i,j] <- f(x,y)
        j = j + 1
    }
    i = i + 1
}

我对此有两个问题:从我的 Python 编程中,我认为跟踪这样的计数器很笨拙,而从我的 R 编程中,我对 for 循环感到紧张.但是,我不太明白如何将 apply(或者即使我应该应用 apply)应用于这个问题,并希望有人能启发我.我现在需要将 f() 视为原子(实际上是 cor.test()).

I have two issues with this: from my Python programming I associate keeping track of counters like this as crufty, and from my R programming I am nervous of for loops. However, I can't quite see how to apply apply (or even if I should apply apply) to this problem and was hoping someone might enlighten me. I need to treat f() as atomic (it's actually cor.test()) for now.

推荐答案

由于您使用的是数据框,因此使用 lapply 或 sapply 执行此操作可能会更快(特别是考虑到您的数据框的范围).例如,

Since you are using data frames, it might be faster to use lapply or sapply to do this (specially given the scope of your data frames). For example,

x <- data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8), col3=c(9,10,11,12))
y <- data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8))
bl <- lapply(x, function(u){
   lapply(y, function(v){
       f(u,v) # Function with column from x and column from y as inputs
   })
})
out = matrix(unlist(bl), ncol=ncol(y), byrow=T)

这篇关于应用两个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 03:12