本文介绍了Python:重新查找最长序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个随机生成的字符串:

I have a string that is randomly generated:

polymer_str = "diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine"

我想找到最长的diNCO diol"序列和最长的diNCO diamine"序列.所以在上面的例子中,最长的diNCO diol"序列是1,最长的diNCO diamine"是3.

I'd like to find the longest sequence of "diNCO diol" and the longest of "diNCO diamine". So in the case above the longest "diNCO diol" sequence is 1 and the longest "diNCO diamine" is 3.

我将如何使用 python 的 re 模块执行此操作?

How would I go about doing this using python's re module?

提前致谢.


我的意思是给定字符串的最长重复次数.所以带有diNCO diamine"的最长字符串是 3:
二醇 diNCO 二胺 diNCO 二胺 diNCO 二胺 diNCO 二醇 diNCO 二胺


I mean the longest number of repeats of a given string. So the longest string with "diNCO diamine" is 3:
diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine

推荐答案

扩展 Ealdwulf答案:

关于 re.findall 的文档可以在 这里.

Documentation on re.findall can be found here.

def getLongestSequenceSize(search_str, polymer_str):
    matches = re.findall(r'(?:\b%s\b\s?)+' % search_str, polymer_str)
    longest_match = max(matches)
    return longest_match.count(search_str)

这可以写成一行,但在这种形式下可读性会降低.

This could be written as one line, but it becomes less readable in that form.

替代方案:

如果polymer_str 很大,使用re.finditer 会更节省内存.您可以这样做:

If polymer_str is huge, it will be more memory efficient to use re.finditer. Here's how you might go about it:

def getLongestSequenceSize(search_str, polymer_str):
    longest_match = ''
    for match in re.finditer(r'(?:\b%s\b\s?)+' % search_str, polymer_str):
        if len(match.group(0)) > len(longest_match):
            longest_match = match.group(0)
    return longest_match.count(search_str)

findallfinditer 最大的区别在于第一个返回一个列表对象,而第二个迭代匹配对象.此外,finditer 方法会稍微慢一些.

The biggest difference between findall and finditer is that the first returns a list object, while the second iterates over Match objects. Also, the finditer approach will be somewhat slower.

这篇关于Python:重新查找最长序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-22 23:02