问题描述
我想从我的DNAStringSet
中检索几个序列.到目前为止,我只设法获得一个序列.
I want to retrieve a couple of sequences from my DNAStringSet
. So far I only manage to get a single sequence.
例如:我想要分离的DNAStringset和序列的列表/模式.
For example: A DNAStringset and the list/pattern of sequences I want to isolate.
测试集:
aDNAStringSet <- DNAStringSet(c("GCATCCATTAC", "AATCGCCATCC", "GCATACCTTAC", "GCATACCTTAC", "GCATACCTTAC"))
名称:
names(aDNAStringSet) <- c("seq1", "seq2", "seq3", "seq4", "seq5")
要分离的序列列表:
patterns <- c("seq2", "seq4", "seq5")
我到目前为止测试的内容:
What I tested so far:
selection <- aDNAStringSet [grep("seq2",names(aDNAStringSet ))]
或
selection <- aDNAStringSet [grep(patterns,names(aDNAStringSet ))]
grep
有效,但仅适用于单个序列.
grep
works, but only for a single sequence.
---------------------- sapply
和match
不起作用:-------
----------------------sapply
and match
doesn't work: -------
使用sapply
:
selection <- aDNAStringSet[unlist(sapply(patterns, grep, aDNAStringSet$names)), ]
或使用match
:
selection <-match(c("seq2", "seq4", "seq5"), aDNAStringSet$names)
我想要只包含"seq2","seq4","seq5"的字符串集,对吗?谢谢K
I want a stringset only containing "seq2", "seq4", "seq5", any idea?ThxK
推荐答案
您可以
aDNAStringSet[names(aDNAStringSet) %in% patterns]
# A DNAStringSet instance of length 3
# width seq names
#[1] 11 AATCGCCATCC seq2
#[2] 11 GCATACCTTAC seq4
#[3] 11 GCATACCTTAC seq5
或使用match
aDNAStringSet[sapply(patterns, function(x) match(x, names(aDNAStringSet)))]
# A DNAStringSet instance of length 3
# width seq names
#[1] 11 AATCGCCATCC seq2
#[2] 11 GCATACCTTAC seq4
#[3] 11 GCATACCTTAC seq5
或者,如果您更喜欢grep
(用于正则表达式匹配)
Or if you prefer grep
(for regexp matching)
aDNAStringSet[sapply(patterns, function(x) grep(x, names(aDNAStringSet)))]
# A DNAStringSet instance of length 3
# width seq names
#[1] 11 AATCGCCATCC seq2
#[2] 11 GCATACCTTAC seq4
#[3] 11 GCATACCTTAC seq5
这篇关于从DNAStringSet子集定义的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!