我使用 Pandas 读取了csv文件,然后仅读取了这两列
Describe_File numbers
0 This is the start 25
1 Ending is coming 42
2 Middle of the story 525
3 This is the start 65
4 This is the start 25
5 Middle of the story 35
6 This is the start 28
7 This is the start 24
8 Ending is coming 24
9 Ending is coming 35
10 Ending is coming 25
11 Ending is coming 24
12 This is the start 215
所以我现在用一个字符串名称**过滤,这是一个开始**,看起来像这样
df = df[df.Describe_File == "This is the start"]
Describe_File numbers
0 This is the start 25
3 This is the start 65
4 This is the start 25
6 This is the start 28
7 This is the start 24
12 This is the start 21
现在我发现差异
np.var(df)
目标
通过所有唯一字符串转到 Describe_File 过滤器,然后找到该字符串的方差和标准偏差。
输出文件应如下所示
最佳答案
如您所知,标准偏差是方差的平方根。因此,以下将是最快的方法。
import pandas as pd
import numpy as np
df_out = df.groupby('Describe_File').apply(np.var)
df_out.columns = ['variance']
df_out['standard_deviation'] = np.sqrt(df_out['variance'])
关于python - 按字符串过滤,然后找到另一列的方差,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/60466076/