问题描述
我有一个文本文件foobar.txt,大约10KB,不长.但是,在高性能Linux计算机上,以下匹配搜索命令大约需要10秒钟.
I have a text file foobar.txt which is around 10KB, not that long. Yet the following match search command takes about 10 seconds on a high-performance Linux machine.
bash>shopt -s extglob
bash>[[ `cat foobar.txt` == ?(*[[:print:]])foobar ]]
没有匹配项:foobar.txt中的所有字符都是可打印的,但是没有字符串"foobar".
There is no match: all the characters in foobar.txt are printable but there is no string "foobar".
搜索应尝试匹配两个备选方案,每个备选方案均不匹配:
The search should try to match two alternatives, each of them will not match:
"foobar"
那是瞬间
*[[:print:]]foobar
-应该这样:
应该逐个字符逐个扫描文件,每次检查下一个字符是否为
should scan the file character by character in one pass, each time, check if the next characters are
[[:print:]]foobar
这也应该很快,每个字符绝不能花费毫秒.
this should also be fast, no way should take a millisecond per character.
实际上,如果我放?,那就做
In fact, if I drop ?, that is, do
bash>[[ `cat foobar.txt` == *[[:print:]]foobar ]]
此是瞬时的.但这仅仅是上面的第二种选择,没有第一种.
this is instantaneous. But this is simply the second alternative above, without the first.
那为什么这么长?
推荐答案
正如其他人所指出的,最好使用 grep
.
As others have noted, you're probably better off using grep
.
也就是说,如果您希望坚持使用 [[
]条件-结合@konsolebox和@rici的建议-您将得到:
That said, if you wanted to stick with a [[
conditional - combining @konsolebox and @rici's advice - you'd get:
[[ $(<foobar.txt) =~ (^|[[:print:]])foobar$ ]]
正则表达式已更新为符合OP的要求-谢谢@rici.
Regex updated to match the OP's requirements - thanks, @rici.
通常来说,最好使用正则表达式进行字符串匹配(在这种情况下,通过 =〜
运算符),而不是[globbing] 模式(通过 ==
运算符),其主要目的是匹配文件名和文件夹名.
Generally speaking, it is preferable to use regular expressions for string matching (via the =~
operator, in this case), rather than [globbing] patterns (via the ==
operator), whose primary purpose is matching file- and folder names.
这篇关于为什么Bash模式匹配?(* [[:: class:]])foobar很慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!