问题描述
我知道有快速的字符串搜索算法,例如 Boyer–Moore 和 Knuth–Morris–Pratt ,具有O(n + m)的复杂度,而平凡的解决方案将是O(n * m).
I know that there are fast string search algorithms, like Boyer–Moore and Knuth–Morris–Pratt, that have O(n+m) complexity, while the trivial solution would be O(n*m).
那么,最流行的工具链(gcc和Visual Studio)的strstr()实现是否使用这些快速的O(n)算法,还是使用了简单的解决方案?
So does the implementation of strstr() for the most popular toolchains - gcc and Visual Studio - use these fast O(n) algorithms, or does it use the trivial solution?
推荐答案
在最坏的情况下,GCC的运行时库使用Two-Way Algorithm
执行2n-m文本字符比较.在搜索阶段它的复杂度为O(n),但它需要一个额外的预处理阶段,即O(m)复杂度.您可以在 http://www-igm.univ-mlv中找到详细信息.fr/〜lecroq/string/node26.html 关于该算法.
GCC's runtime library uses Two-Way Algorithm
which performs 2n-m text character comparisons in the worst case. It is O(n) complexity in the search phase but it needs a extra preprocessing phase which is O(m) complexity. You could find details on http://www-igm.univ-mlv.fr/~lecroq/string/node26.html about the algorithm.
AFAIK MSVC运行时以O(n * m)复杂度最幼稚的方式执行strstr
.但是蛮力并不需要额外的内存空间,因此它永远不会引发错误的alloc异常. KMP需要O(m)额外的空间,而双向需要一个恒定的额外空间.
AFAIK MSVC runtime is doing strstr
in the most naive way, in O(n*m) complexity. But brute force doesn't need extra memory space, so it never raise a bad alloc exception. KMP need O(m) extra space, and Two-Way need a constant extra space.
GCC在做什么,就像使用FFT计算乘法一样.在纸上看起来非常快,但是在实践中却真的很慢. MSVC将在可用的strstr
中使用SIMD指令,因此在大多数情况下它甚至更快.如果要编写自己的库,我将选择SIMD的暴力破解方法.
What GCC is doing sounds just like using FFT to calculate multiplys. Looks extremely fast in paper, but really slow in practice. MSVC will use SIMD instructions in strstr
when they are availiable so it's even faster in most case. I will choose the brute force approach with SIMD if I'm going to write my own library.
这篇关于在gcc和VS中strstr()的实现是否具有线性复杂度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!