dcast与自定义fun.aggregate

本文介绍了dcast与自定义fun.aggregate的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据：

sample start end gene coverage
X      1     10  A    5
X      11    20  A    10
Y      1     10  A    5
Y      11    20  A    10
X      1     10  B    5
X      11    20  B    10
Y      1     10  B    5
Y      11    20  B    10

我添加了其他列：

data$length <- (data$end - data$start + 1)

data$ct_lt <- (data$length * data$coverage)

我使用dcast重新格式化了我的数据：

I reformated my data using dcast:

casted <- dcast(data, gene ~ sample, value.var = "coverage", fun.aggregate = mean)

所以我的新数据看起来像这样：

So my new data looks like this:

gene    X       Y
A      10.00000 10.00000
B      38.33333 38.33333

这是我想要的正确的数据格式，但我想fun.aggregate不同。相反，我想采用加权平均值，覆盖率按长度加权：

This is the correct data format I desire, but I would like to fun.aggregate differently. Instead, I would like to take a weighted average, with coverage weighted by length:

（sum（ct_lt））/（sum（length））

( sum (ct_lt) ) / ( sum ( length ) )

我该如何做呢？

推荐答案

我认为你的朋友在这里可能是dplyr和tidyr包。

Disclosure: no R in front of me, but I think your friend here may be the dplyr and tidyr packages.

当然有很多方法可以实现这一点，但我认为以下可能会让你开始

Certainly lots of ways to accomplish this, but I think the following might get you started

library(dplyr)
library(tidyr)

data %>%
select(gene, sample, ct_lt, length) %>%
group_by(gene, sample) %>%
summarise(weight_avg = sum(ct_lt) / sum(length)) %>%
spread(sample, weight_avg)

希望这有助于...

这篇关于dcast与自定义fun.aggregate的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！