本文介绍了从DNAStringSet子集定义的组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从我的DNAStringSet中检索几个序列.到目前为止,我只设法获得一个序列.

I want to retrieve a couple of sequences from my DNAStringSet. So far I only manage to get a single sequence.

例如:我想要分离的DNAStringset和序列的列表/模式.

For example: A DNAStringset and the list/pattern of sequences I want to isolate.

测试集:

aDNAStringSet <- DNAStringSet(c("GCATCCATTAC", "AATCGCCATCC", "GCATACCTTAC", "GCATACCTTAC", "GCATACCTTAC"))

名称:

names(aDNAStringSet) <- c("seq1", "seq2", "seq3", "seq4", "seq5") 

要分离的序列列表:

patterns <- c("seq2", "seq4", "seq5")   

我到目前为止测试的内容:

What I tested so far:

selection <- aDNAStringSet [grep("seq2",names(aDNAStringSet ))] 

selection <- aDNAStringSet [grep(patterns,names(aDNAStringSet ))]

grep有效,但仅适用于单个序列.

grep works, but only for a single sequence.

---------------------- sapplymatch不起作用:-------

----------------------sapplyand match doesn't work: -------

使用sapply:

selection <- aDNAStringSet[unlist(sapply(patterns, grep, aDNAStringSet$names)), ]

或使用match:

selection <-match(c("seq2", "seq4", "seq5"), aDNAStringSet$names)    

我想要只包含"seq2","seq4","seq5"的字符串集,对吗?谢谢K

I want a stringset only containing "seq2", "seq4", "seq5", any idea?ThxK

推荐答案

您可以

aDNAStringSet[names(aDNAStringSet) %in% patterns]
#  A DNAStringSet instance of length 3
#    width seq                                               names
#[1]    11 AATCGCCATCC                                       seq2
#[2]    11 GCATACCTTAC                                       seq4
#[3]    11 GCATACCTTAC                                       seq5    

或使用match

aDNAStringSet[sapply(patterns, function(x) match(x, names(aDNAStringSet)))]
#  A DNAStringSet instance of length 3
#    width seq                                               names
#[1]    11 AATCGCCATCC                                       seq2
#[2]    11 GCATACCTTAC                                       seq4
#[3]    11 GCATACCTTAC                                       seq5

或者,如果您更喜欢grep(用于正则表达式匹配)

Or if you prefer grep (for regexp matching)

aDNAStringSet[sapply(patterns, function(x) grep(x, names(aDNAStringSet)))]
#  A DNAStringSet instance of length 3
#    width seq                                               names
#[1]    11 AATCGCCATCC                                       seq2
#[2]    11 GCATACCTTAC                                       seq4
#[3]    11 GCATACCTTAC                                       seq5

这篇关于从DNAStringSet子集定义的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-31 14:57