计算一个字符串中的多个字母组

本文介绍了计算一个字符串中的多个字母组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在尝试修改python函数以计算字母组（而不是单个字母），但遇到了一些麻烦。这是我必须计算单个字母的代码：

def count_letters（str）：
计数= {} c中的c的
：如果c的计数为
，如果c的计数为：
计数[c] + = 1
else：
个计数[c] = 1
个返回计数

个计数= count_letters（my_seq）
个print（counts）

该函数当前吐出每个字母的计数。现在，它显示以下内容：

  {'C'：23，'T'：30，'G'：30，'A'：20 }

理想情况下，我希望它打印出以下内容：

  {'CTA'：2，'TAG'：3，'CGC'：1，'GAG'：2 ...}

我是python的新手，事实证明这很困难。

解决方案

使用。

 从集合导入计数器
 
s =  CTAACAAC 
 
 def chunk_string（s，n）：
 return [s [i：i + n] for i in range（len（s）-n + 1）] 
 
 counter = Counter（chunk_string（s，3））
＃Counter（{'AAC'：2，'ACA'：1，'CAA'：1，'CTA'：1，' TAA'：1}）

编辑：要详细说明 chunk_string ：

需要字符串 s 和一个块将 n 用作参数。每个 s [i：i + n] 是字符串的一部分，长度为 n 个字符。循环遍历可对字符串进行切片的有效索引（ 0 到 len（s）-n ）。然后将所有这些分片按列表理解分组。等效的方法是：

  def chunk_string（s，n）：
个块= [] 
个last_index = len（s）-n在范围（0，last_index + 1）中的
：
 chunks.append（s [i：i + n]）
返回块

I've been trying to adapt my python function to count groups of letters instead of single letters and I'm having a bit of trouble. Here's the code I have to count individual letters:

my_seq = "CTAAAGTCAACCTTCGGTTGACCTTGAAAGGGCCTTGGGAACCTTCGGTTGACCTTGAGGGTTCCCTAAGGGTT"

def count_letters(str):
    counts = {}
    for c in str:
        if c in counts:
            counts[c]+=1
        else:
            counts[c]=1
    return counts

counts = count_letters(my_seq)
print(counts)

The function currently spits out counts for each individual letter. Right now it prints this:

{'C': 23, 'T': 30, 'G': 30, 'A': 20}

Ideally, I'd like it to print something like this:

{'CTA': 2, 'TAG': 3, 'CGC': 1, 'GAG': 2 ... }

I'm very new to python and this is proving to be difficult.

解决方案

This can be done pretty quickly using collections.Counter.

from collections import Counter

s = "CTAACAAC"

def chunk_string(s, n):
    return [s[i:i+n] for i in range(len(s)-n+1)]

counter = Counter(chunk_string(s, 3))
# Counter({'AAC': 2, 'ACA': 1, 'CAA': 1, 'CTA': 1, 'TAA': 1})

Edit: To elaborate on chunk_string:

It takes a string s and a chunk size n as arguments. Each s[i:i+n] is a slice of the string that is n characters long. The loop iterates over the valid indices where the string can be sliced (0 to len(s)-n). All of these slices are then grouped in a list comprehension. An equivalent method is:

def chunk_string(s, n):
    chunks = []
    last_index = len(s) - n
    for i in range(0, last_index + 1):
        chunks.append(s[i:i+n])
    return chunks

这篇关于计算一个字符串中的多个字母组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！