问题描述
我正在尝试匹配包含字符串的文件中的行,例如 ACTGGGTAAACTA
.如果我愿意
I am trying to match rows in a file containing a string say ACTGGGTAAACTA
. IfI do
grep "ACTGGGTAAACTA" file
它给了我完全匹配的行.有没有办法允许一定数量的错配(替换、插入或删除)?例如,我正在寻找序列
It gives me rows which have exact matches. Is there a way to allow for certain number of mismatches (substitutions, insertions or deletions)? For example, I am looking for sequences
最多 3 个允许的替代词,例如AGTGGGTAACCAA"等.
Up to 3 allowed subtitutions like "AGTGGGTAACCAA" etc.
插入/删除(部分匹配,如ACTGGGAAAATAAACTA"或ACTAAAACTA")
Insertions/deletions (having a partial match like "ACTGGGAAAATAAACTA" or "ACTAAACTA")
推荐答案
曾经有一个工具叫做 agrep
用于模糊正则表达式匹配,但它被放弃了.
There used to be a tool called agrep
for fuzzy regex matching, but it got abandoned.
http://en.wikipedia.org/wiki/Agrep 有一些历史以及相关工具的链接.
http://en.wikipedia.org/wiki/Agrep has a bit of history and links to related tools.
https://github.com/Wikinaut/agrep 看起来像是一个复兴的开源版本,但我没有测试过.
https://github.com/Wikinaut/agrep looks like a revived open source release, but I have not tested it.
如果失败,请查看您是否可以为您的发行版找到 tre-agrep
.
Failing that, see if you can find tre-agrep
for your distro.
这篇关于使用grep进行模糊字符串匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!