本文介绍了R与findOverlaps()重叠多个GRanges的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有三个具有不同基因组间隔的表格.这是一个示例:

I have three tables with differing genomic intervals. Here is an example:

> a
   chr interval.start interval.end names
1 chr1              5           10     a
2 chr1              6           10     b
3 chr2              7           10     c
4 chr3              8           10     d

> b
   chr interval.start interval.end names
1 chr1              6           15     e
2 chr1              7           15     f
3 chr1              8           15     g

> c
   chr interval.start interval.end names
1 chr1              7           12     h
2 chr1              8           12     i
3 chr5              9           12     j
4 chr10             10          12     k
5 chr20             11          12     l

在将信息转换为GRanges之后,我试图找到所有表之间的公共间隔.本质上,我想做类似intersect(c,intersect(a,b))的事情.但是,由于我使用的是基因组坐标,因此必须使用我不熟悉的GRanges和GenomicRanges程序包.

I am trying to find the common intervals between all tables after converting info to GRanges. Essentially I want to do something like intersect(c,intersect(a,b)). However, because I am using genomic coordinates, I have to do this with GRanges and GenomicRanges package, which I am not familiar with.

我可以执行findOverlaps(gr,gr1)或findOverlaps(gr1,gr2),但是有没有一种简单的方法可以像findOverlaps(gr,gr1,gr2)一样一次重叠多个GRanges?

I can do findOverlaps(gr, gr1) or findOverlaps(gr1, gr2), but is there an easy way to overlap multiple GRanges at once like findOverlaps(gr, gr1, gr2)?

任何帮助将不胜感激.如果在其他地方提出了这个问题,我事先表示歉意.

Any help would be appreciated. If this question was asked elsewhere, I apologize in advance.

谢谢

推荐答案

您可以使用一个成对比较的subsetByOverlaps结果将其中一个子集作为子集,然后使用该子集与第三组进行比较.

You can subset one of them using the subsetByOverlaps result of one pairwise comparison then use that subset to compare to the third set.

Sub1 <- subsetByOverlaps(gr,gr1)
Sub2 <- subsetByOverlaps(sub1,gr2)

或直接

Reduce(subsetByOverlaps, list(gr, gr1, gr2))

导致在所有3个GRanges对象中重叠的GRanges对象的子集

resulting in the subset of the GRanges object that overlap in all 3 GRanges objects

根据您想要的重叠类型以及哪些重叠范围最大,您应该考虑将哪个用作查询以及哪个主题.

Depending on the type of overlap you want and which has the largest ranges, you should consider which to use as the query and which the subject.

这篇关于R与findOverlaps()重叠多个GRanges的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-14 16:07