本文介绍了组合IRanges对象并维护mcol的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将从一个示例开始,然后描述我要使用的逻辑.

I'll start with an example, and then describe the logic I'm trying to use.

我有两个普通的IRanges对象,它们跨越了相同的总范围,但可能在不同数量的范围内.每个IRanges都有一个mcol,但是mcolIRanges之间是不同的.

I have two normal IRanges objects that span the same total range, but may do so in a different number of ranges. Each IRanges has one mcol, but that mcol is different across IRanges.

a
#IRanges object with 1 range and 1 metadata column:
#          start       end     width | on_betalac
#      <integer> <integer> <integer> |  <logical>
#  [1]         1       167       167 |      FALSE
b
#IRanges object with 3 ranges and 1 metadata column:
#          start       end     width |  on_other
#      <integer> <integer> <integer> | <logical>
#  [1]         1       107       107 |     FALSE
#  [2]       108       112         5 |      TRUE
#  [3]       113       167        55 |     FALSE

您可以看到这两个IRanges范围从1到167,但是a具有一个范围,而b具有三个范围.我想将它们结合起来以得到如下输出:

You can see both of these IRanges span 1 to 167, but a has one range and b has three. I would like to combine them to get output like this:

my_great_function(a, b)
#IRanges object with 3 ranges and 2 metadata columns:
#          start       end     width | on_betalac  on_other
#      <integer> <integer> <integer> |  <logical> <logical>
#  [1]         1       107       107 |     FALSE     FALSE
#  [2]       108       112         5 |     FALSE      TRUE
#  [3]       113       167        55 |     FALSE     FALSE

输出类似于输入的disjoin,但是它保留mcols,甚至扩展它们,以使输出范围与导致输入范围的输入范围具有相同的mcol值.

The output is a like a disjoin of the inputs, but it keeps the mcols, and even spreads them so that the output range has the same value of the mcol as the input range that led to it.

推荐答案

选项1:使用IRanges::findOverlaps

m <- findOverlaps(b, a)
c <- b[queryHits(m)]
mcols(c) <- cbind(mcols(c), mcols(a[subjectHits(m)]))
#IRanges object with 3 ranges and 2 metadata columns:
#          start       end     width |  on_other on_betacalc
#      <integer> <integer> <integer> | <logical>   <logical>
#  [1]         1       107       107 |     FALSE       FALSE
#  [2]       108       112         5 |      TRUE       FALSE
#  [3]       113       167        55 |     FALSE       FALSE

结果对象c是具有两个元数据列的IRanges对象.

The resulting object c is a IRanges object with two metadata columns.

c <- mergeByOverlaps(b, a)
c
#DataFrame with 3 rows and 4 columns
#          b  on_other         a on_betacalc
#  <IRanges> <logical> <IRanges>   <logical>
#1     1-107     FALSE     1-167       FALSE
#2   108-112      TRUE     1-167       FALSE
#3   113-167     FALSE     1-167       FALSE

结果输出对象是DataFrame,其中IRanges列和原始元数据列作为附加列.

The resulting output object is a DataFrame with IRanges columns and original metadata columns as additional columns.

library(data.table)
a.dt <- as.data.table(cbind.data.frame(a, mcols(a)))[, width := NULL]
b.dt <- as.data.table(cbind.data.frame(b, mcols(b)))[, width := NULL]

setkey(b.dt, start, end)
foverlaps(a.dt, b.dt, type = "any")[, `:=`(i.start = NULL, i.end = NULL)][]
   start end on_other on_betacalc
1:     1 107    FALSE       FALSE
2:   108 112     TRUE       FALSE
3:   113 167    FALSE       FALSE

生成的对象是data.table.

library(fuzzyjoin)
a.df <- cbind.data.frame(a, mcols(a))
b.df <- cbind.data.frame(b, mcols(b))
interval_left_join(b.df, a.df, by = c("start", "end"))
#  start.x end.x width.x on_other start.y end.y width.y on_betacalc
#1       1   107     107    FALSE       1   167     167       FALSE
#2     108   112       5     TRUE       1   167     167       FALSE
#3     113   167      55    FALSE       1   167     167       FALSE

生成的对象是data.frame.

library(IRanges)
a <- IRanges(1, 167)
mcols(a)$on_betacalc = F

b <- IRanges(c(1, 108, 113), c(107, 112, 167))
mcols(b)$on_other <- c(F, T, F)

这篇关于组合IRanges对象并维护mcol的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-16 02:06