我使用 Pandas 读取了csv文件,然后仅读取了这两列

  • Describe_File
  • 数字
  • Describe_File   numbers
    0   This is the start   25
    1   Ending is coming    42
    2   Middle of the story 525
    3   This is the start   65
    4   This is the start   25
    5   Middle of the story 35
    6   This is the start   28
    7   This is the start   24
    8   Ending is coming    24
    9   Ending is coming    35
    10  Ending is coming    25
    11  Ending is coming    24
    12  This is the start   215
    
    

    所以我现在用一个字符串名称**过滤,这是一个开始**,看起来像这样
    df = df[df.Describe_File == "This is the start"]
        Describe_File   numbers
    0   This is the start   25
    3   This is the start   65
    4   This is the start   25
    6   This is the start   28
    7   This is the start   24
    12  This is the start   21
    

    现在我发现差异np.var(df)
    目标

    通过所有唯一字符串转到 Describe_File 过滤器,然后找到该字符串的方差和标准偏差。

    输出文件应如下所示

    python - 按字符串过滤,然后找到另一列的方差-LMLPHP

    最佳答案

    如您所知,标准偏差是方差的平方根。因此,以下将是最快的方法。

    import pandas as pd
    import numpy as np
    
    df_out = df.groupby('Describe_File').apply(np.var)
    df_out.columns = ['variance']
    df_out['standard_deviation'] = np.sqrt(df_out['variance'])
    

    关于python - 按字符串过滤,然后找到另一列的方差,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/60466076/

    10-09 05:44
    查看更多