The SeqAn tutorial for Pattern Matching提到StringSet
既可以用作干草堆也可以用作针头。尝试按以下方式将StringSet
用作干草堆时,
StringSet<Dna5String> seqs;
/* do stuff to load sequences into seqs */
Finder<StringSet<Dna5String> > finder(seqs);
Pattern<Dna5String, Simple> pattern(Dna5String("GAATTC"));
if (find(finder, pattern))
{
std::cout << '[' << beginPosition(finder) << ',' << endPosition(finder)
<< ")\t" << infix(finder) << std::endl;
} else
{
std::cout << "No match!";
}
我得到错误:
任何人都有关于如何正确执行此操作的想法?
在
Dna5String
中使用单个Finder
可以正常工作。该教程确实显示了如何进行离线搜索(即使用索引编制),但这不是我想要的。如果不想使用SeqAn中的Finder-Pattern工具处理过StringSet
,则我不希望手动对其进行迭代。 最佳答案
你可以试试,
#include <iostream>
#include <seqan/sequence.h> // CharString, ...
#include <seqan/find.h>
#include <seqan/stream.h>
using namespace seqan;
typedef Iterator<StringSet<Dna5String> >::Type TStringSetIterator;
int main(int, char const **)
{
StringSet<Dna5String> seqs;
Dna5String seq1 =
"TAGGTTTTCCGAAAAGGTAGCAACTTTACGTGATCAAACCTCTGACGGGGTTTTCCCCGTCGAAATTGGGTG"
"TTTCTTGTCTTGTTCTCACTTGGGGCATCTCCGTCAAGCCAAGAAAGTGCTCCCTGGATTCTGTTGCTAACG"
"AGTCTCCTCTGCATTCCTGCTTGACTGATTGGGCGGACGGGGTGTCCACCTGACGCTGAGTATCGCCGTCAC"
"GGTGCCACATGTCTTATCTATTCAGGGATCAGAATTCATTCAGGAAATCAGGAGATGCTACACTTGGGTTAT"
"CGAAGCTCCTTCCAAGGCGTAGCAAGGGCGACTGAGCGCGTAAGCTCTAGATCTCCTCGTGTTGCAACTACA"
"CGCGCGGGTCACTCGAAACACATAGTATGAACTTAACGACTGCTCGTACTGAACAATGCTGAGGCAGAAGAT"
"CGCAGACCAGGCATCCCACTGCTTGAAAAAACTATNNNNCTACCCGCCTTTTTATTATCTCATCAGATCAAG";
Dna5String seq2 =
"ACCGACGATTAGCTTTGTCCGAGTTACAACGGTTCAATAATACAAAGGATGGCATAAACCCATTTGTGTGAA"
"AGTGCCCATCACATTATGATTCTGTCTACTATGGTTAATTCCCAATATACTCTCGAAAAGAGGGTATGCTCC"
"CACGGCCATTTACGTCACTAAAAGATAAGATTGCTCAAANNNNNNNNNACTGCCAACTTGCTGGTAGCTTCA"
"GGGGTTGTCCACAGCGGGGGGTCGTATGCCTTTGTGGTATACCTTACTAGCCGCGCCATGGTGCCTAAGAAT"
"GAAGTAAAACAATTGATGTGAGACTCGACAGCCAGGCTTCGCGCTAAGGACGCAAAGAAATTCCCTACATCA"
"GACGGCCGCGNNNAACGATGCTATCGGTTAGGACATTGTGCCCTAGTATGTACATGCCTAATACAATTGGAT"
"CAAACGTTATTCCCACACACGGGTAGAAGAACNNNNATTACCCGTAGGCACTCCCCGATTCAAGTAGCCGCG";
clear(seqs);
appendValue(seqs, seq1);
appendValue(seqs, seq2);
Pattern<Dna5String, Simple> pattern(Dna5String("GAATTC"));
//For each sequence in seqs
for (TStringSetIterator it = begin(seqs); it != end(seqs); ++it)
{
std::cout << *it << std::endl;
//I create a finder for each sequence in seqs
Finder<Dna5String> finder(*it);
if (find(finder, pattern)){
std::cout << '[' << beginPosition(finder) << ',' << endPosition(finder)
<< ")\t" << infix(finder) << std::endl;
}else{
std::cout << "No match!" << std::endl;
}
}
return 0;
}
你得到:
TAGGTTTTCCGAAAAGGTAGCAACTTTACGTGATCAAACCTCTGACGGGGTTTTCCCCGTCGAAATTGGGTGTTTCTTGTCTTGTTCTCACTTGGGGCATCTCCGTCAAGCCAAGAAAGTGCTCCCTGGATTCTGTTGCTAACGAGTCTCCTCTGCATTCCTGCTTGACTGATTGGGCGGACGGGGTGTCCACCTGACGCTGAGTATCGCCGTCACGGTGCCACATGTCTTATCTATTCAGGGATCAGAATTCATTCAGGAAATCAGGAGATGCTACACTTGGGTTATCGAAGCTCCTTCCAAGGCGTAGCAAGGGCGACTGAGCGCGTAAGCTCTAGATCTCCTCGTGTTGCAACTACACGCGCGGGTCACTCGAAACACATAGTATGAACTTAACGACTGCTCGTACTGAACAATGCTGAGGCAGAAGATCGCAGACCAGGCATCCCACTGCTTGAAAAAACTATNNNNCTACCCGCCTTTTTATTATCTCATCAGATCAAG [247,253) GAATTC ACCGACGATTAGCTTTGTCCGAGTTACAACGGTTCAATAATACAAAGGATGGCATAAACCCATTTGTGTGAAAGTGCCCATCACATTATGATTCTGTCTACTATGGTTAATTCCCAATATACTCTCGAAAAGAGGGTATGCTCCCACGGCCATTTACGTCACTAAAAGATAAGATTGCTCAAANNNNNNNNNACTGCCAACTTGCTGGTAGCTTCAGGGGTTGTCCACAGCGGGGGGTCGTATGCCTTTGTGGTATACCTTACTAGCCGCGCCATGGTGCCTAAGAATGAAGTAAAACAATTGATGTGAGACTCGACAGCCAGGCTTCGCGCTAAGGACGCAAAGAAATTCCCTACATCAGACGGCCGCGNNNAACGATGCTATCGGTTAGGACATTGTGCCCTAGTATGTACATGCCTAATACAATTGGATCAAACGTTATTCCCACACACGGGTAGAAGAACNNNNATTACCCGTAGGCACTCCCCGATTCAAGTAGCCGCG No match!
EDIT, I hope this help you
....
#include <seqan/index.h>
....
Pattern<Dna5String> pattern(Dna5String("GAATTC"));
Index< StringSet<Dna5String > > myIndex(seqs);
Finder< Index<StringSet<Dna5String > > > finder(myIndex);
while (find(finder, pattern)){
std::cout << '[' << beginPosition(finder) << ',' << endPosition(finder)
<< ")\t" << infix(finder) << std::endl;
}
....
你得到,
[,)GAATTC
关于c++ - 通过StringSet进行在线模式搜索,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/32957614/