PostgreSQL计算子字符串在文本中出现的次数

PostgreSQL计算子字符串在文本中出现的次数

本文介绍了PostgreSQL计算子字符串在文本中出现的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写PostgreSQL函数,以计算特定文本子字符串在另一段文本中出现的次数。例如,调用count('foobarbaz','ba')应该返回2。

I'm writing a PostgreSQL function to count the number of times a particular text substring occurs in another piece of text. For example, calling count('foobarbaz', 'ba') should return 2.

我知道要测试子字符串是否出现,我使用类似于如下:

I understand that to test whether the substring occurs, I use a condition similar to the below:

    WHERE 'foobarbaz' like '%ba%'

但是,我需要它返回 ba出现次数的2。我该怎么办?

However, I need it to return 2 for the number of times 'ba' occurs. How can I proceed?

预先感谢您的帮助。

推荐答案

我强烈建议您查看我发布到。事实证明,所选答案比 regexp_replace()的改版慢得多。创建行以及运行聚合的开销实在太高了。

I would highly suggest checking out this answer I posted to "How do you count the occurrences of an anchored string using PostgreSQL?". The chosen answer was shown to be massively slower than an adapted version of regexp_replace(). The overhead of creating the rows, and the running the aggregate is just simply too high.

执行此操作的最快方法如下...

The fastest way to do this is as follows...

SELECT
  (length(str) - length(replace(str, replacestr, '')) )::int
  / length(replacestr)
FROM ( VALUES
  ('foobarbaz', 'ba')
) AS t(str, replacestr);

在这里我们


  1. 取字符串的长度, L1

  2. L1 除去所有替换项的字符串的长度 L2 以获得 L3 字符串长度的差。

  3. 通过替换长度划分 L3 以获得出现次数

  1. Take the length of the string, L1
  2. Subtract from L1 the length of the string with all of the replacements removed L2 to get L3 the difference in string length.
  3. Divide L3 by the length of the replacement to get the occurrences

与使用 regexp_matches()的方法相比,大约要快 5倍

For comparison that's about five times faster than the method of using regexp_matches() which looks like this.

SELECT count(*)
FROM ( VALUES
  ('foobarbaz', 'ba')
) AS t(str, replacestr)
CROSS JOIN LATERAL regexp_matches(str, replacestr, 'g');

这篇关于PostgreSQL计算子字符串在文本中出现的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-15 11:08