问题描述
我将从一个示例开始,然后描述我要使用的逻辑.
I'll start with an example, and then describe the logic I'm trying to use.
我有两个普通的IRanges
对象,它们跨越了相同的总范围,但可能在不同数量的范围内.每个IRanges
都有一个mcol
,但是mcol
在IRanges
之间是不同的.
I have two normal IRanges
objects that span the same total range, but may do so in a different number of ranges. Each IRanges
has one mcol
, but that mcol
is different across IRanges
.
a
#IRanges object with 1 range and 1 metadata column:
# start end width | on_betalac
# <integer> <integer> <integer> | <logical>
# [1] 1 167 167 | FALSE
b
#IRanges object with 3 ranges and 1 metadata column:
# start end width | on_other
# <integer> <integer> <integer> | <logical>
# [1] 1 107 107 | FALSE
# [2] 108 112 5 | TRUE
# [3] 113 167 55 | FALSE
您可以看到这两个IRanges
范围从1到167,但是a
具有一个范围,而b
具有三个范围.我想将它们结合起来以得到如下输出:
You can see both of these IRanges
span 1 to 167, but a
has one range and b
has three. I would like to combine them to get output like this:
my_great_function(a, b)
#IRanges object with 3 ranges and 2 metadata columns:
# start end width | on_betalac on_other
# <integer> <integer> <integer> | <logical> <logical>
# [1] 1 107 107 | FALSE FALSE
# [2] 108 112 5 | FALSE TRUE
# [3] 113 167 55 | FALSE FALSE
输出类似于输入的disjoin
,但是它保留mcols
,甚至扩展它们,以使输出范围与导致输入范围的输入范围具有相同的mcol
值.
The output is a like a disjoin
of the inputs, but it keeps the mcols
, and even spreads them so that the output range has the same value of the mcol
as the input range that led to it.
推荐答案
选项1:使用IRanges::findOverlaps
m <- findOverlaps(b, a)
c <- b[queryHits(m)]
mcols(c) <- cbind(mcols(c), mcols(a[subjectHits(m)]))
#IRanges object with 3 ranges and 2 metadata columns:
# start end width | on_other on_betacalc
# <integer> <integer> <integer> | <logical> <logical>
# [1] 1 107 107 | FALSE FALSE
# [2] 108 112 5 | TRUE FALSE
# [3] 113 167 55 | FALSE FALSE
结果对象c
是具有两个元数据列的IRanges
对象.
The resulting object c
is a IRanges
object with two metadata columns.
c <- mergeByOverlaps(b, a)
c
#DataFrame with 3 rows and 4 columns
# b on_other a on_betacalc
# <IRanges> <logical> <IRanges> <logical>
#1 1-107 FALSE 1-167 FALSE
#2 108-112 TRUE 1-167 FALSE
#3 113-167 FALSE 1-167 FALSE
结果输出对象是DataFrame
,其中IRanges
列和原始元数据列作为附加列.
The resulting output object is a DataFrame
with IRanges
columns and original metadata columns as additional columns.
library(data.table)
a.dt <- as.data.table(cbind.data.frame(a, mcols(a)))[, width := NULL]
b.dt <- as.data.table(cbind.data.frame(b, mcols(b)))[, width := NULL]
setkey(b.dt, start, end)
foverlaps(a.dt, b.dt, type = "any")[, `:=`(i.start = NULL, i.end = NULL)][]
start end on_other on_betacalc
1: 1 107 FALSE FALSE
2: 108 112 TRUE FALSE
3: 113 167 FALSE FALSE
生成的对象是data.table
.
library(fuzzyjoin)
a.df <- cbind.data.frame(a, mcols(a))
b.df <- cbind.data.frame(b, mcols(b))
interval_left_join(b.df, a.df, by = c("start", "end"))
# start.x end.x width.x on_other start.y end.y width.y on_betacalc
#1 1 107 107 FALSE 1 167 167 FALSE
#2 108 112 5 TRUE 1 167 167 FALSE
#3 113 167 55 FALSE 1 167 167 FALSE
生成的对象是data.frame
.
library(IRanges)
a <- IRanges(1, 167)
mcols(a)$on_betacalc = F
b <- IRanges(c(1, 108, 113), c(107, 112, 167))
mcols(b)$on_other <- c(F, T, F)
这篇关于组合IRanges对象并维护mcol的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!