问题描述
我正在编写PostgreSQL函数,以计算特定文本子字符串在另一段文本中出现的次数。例如,调用count('foobarbaz','ba')应该返回2。
I'm writing a PostgreSQL function to count the number of times a particular text substring occurs in another piece of text. For example, calling count('foobarbaz', 'ba') should return 2.
我知道要测试子字符串是否出现,我使用类似于如下:
I understand that to test whether the substring occurs, I use a condition similar to the below:
WHERE 'foobarbaz' like '%ba%'
但是,我需要它返回 ba出现次数的2。我该怎么办?
However, I need it to return 2 for the number of times 'ba' occurs. How can I proceed?
预先感谢您的帮助。
推荐答案
我强烈建议您查看我发布到。事实证明,所选答案比 regexp_replace()
的改版慢得多。创建行以及运行聚合的开销实在太高了。
I would highly suggest checking out this answer I posted to "How do you count the occurrences of an anchored string using PostgreSQL?". The chosen answer was shown to be massively slower than an adapted version of regexp_replace()
. The overhead of creating the rows, and the running the aggregate is just simply too high.
执行此操作的最快方法如下...
The fastest way to do this is as follows...
SELECT
(length(str) - length(replace(str, replacestr, '')) )::int
/ length(replacestr)
FROM ( VALUES
('foobarbaz', 'ba')
) AS t(str, replacestr);
在这里我们
- 取字符串的长度,
L1
- 从
L1
除去所有替换项的字符串的长度L2
以获得L3
字符串长度的差。 - 通过替换长度划分
L3
以获得出现次数
- Take the length of the string,
L1
- Subtract from
L1
the length of the string with all of the replacements removedL2
to getL3
the difference in string length. - Divide
L3
by the length of the replacement to get the occurrences
与使用 regexp_matches()
的方法相比,大约要快 5倍
For comparison that's about five times faster than the method of using regexp_matches()
which looks like this.
SELECT count(*)
FROM ( VALUES
('foobarbaz', 'ba')
) AS t(str, replacestr)
CROSS JOIN LATERAL regexp_matches(str, replacestr, 'g');
这篇关于PostgreSQL计算子字符串在文本中出现的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!