我正在尝试创建一个pandas数据框,并从另一个数据框中迭代计算statisitcs,它通过列(使用正则表达式过滤)。我如何创建结果数据框?
输入数据框:

    In [4]: control.head()
    Out[4]:
  Patient Gender  Age  Left-Lateral-Ventricle_NVoxels  Left-Inf-Lat-
Vent_NVoxels  ...  supramarginal_CurvInd_lh
0    P008      M   30                            9414
311  ...                       7.5
1    P013      F   35                            7668
85  ...                      10.4
2    P018      F   27                            7350
202  ...                       8.0
3    P033      F   55                            7548
372  ...                       9.2
4    P036      F   31                            8598
48  ...                       8.0

    [5 rows x 930 columns]


我写了一个代码来统计统计信息,但坚持创建结果熊猫数据框

def select_volumes(group_c,group_k):
    Select_list = ["Amygdala", "Hippocampus", "Lateral-Ventricle",
"Pallidum", "Putamen", "Thalamus"]
    Side = ["Left", "Right"]
    for s in Side:
        for struct in Select_list:
            volumes_c = group_c.filter(regex="^(?=.*"+s+")(?=.*"+struct+")
   (?=.*Volume)")
            volumes_k = group_k.filter(regex="^(?=.*"+s+")(?=.*"+struct+")
   (?=.*Volume)")
            k = cohens_d(volumes_c, volumes_k)
            meand = volumes_c.mean()
            result_df = pd.Dataframe(
{
     "Cohen's norm": some result
     "Mean Value": meand
}
)
            return k


函数select_volumes给我结果:

Left-Amygdala_Volume_mm3   -0.29729
dtype: float64
Left-Hippocampus_Volume_mm3    0.33139
dtype: float64
Left-Lateral-Ventricle_Volume_mm3   -0.111853
dtype: float64
Left-Pallidum_Volume_mm3    0.28857
dtype: float64
Left-Putamen_Volume_mm3    0.696645
dtype: float64
Left-Thalamus-Proper_Volume_mm3    0.772492
dtype: float64
Right-Amygdala_Volume_mm3   -0.358333
dtype: float64
Right-Hippocampus_Volume_mm3    0.275668
dtype: float64
Right-Lateral-Ventricle_Volume_mm3   -0.092283
dtype: float64
Right-Pallidum_Volume_mm3    0.279258
dtype: float64
Right-Putamen_Volume_mm3    0.484879
dtype: float64
Right-Thalamus-Proper_Volume_mm3    0.809775
dtype: float64


我希望Left-Amygdala_Volume_mm3 ...是值为-0.29729且列名为Cohen's d的行作为每个Select_list的列:
example, how dataframe should looks

最佳答案

我仍然无法真正理解它的方式和位置,但是您证明了在函数的某个地方您可以构建一个float64系列,其中包含例如Left-Amygdala_Volume_mm3作为索引,而-0.29729作为值。我假设同时,对于相同的索引值,您具有meand的值。

更确切地说,我将假设:

k = pd.Series([-0.29729], dtype=np.float64,index=['Left-Amygdala_Volume_mm3'])


因为它打印为:

print(k)

Left-Amygdala_Volume_mm3   -0.29729
dtype: float64


同时,我假设meand也是类似的系列。因此,我们将以meand.iloc[0]的形式访问其值(假设值为9174.1)

您应该将它们结合起来以构建一行的内容:

row = k.reset_index().iloc[0].tolist() + [meand.iloc[0]]


在示例中,我们有row['Left-Amygdala_Volume_mm3', -0.29729, 9174.1]

因此,您现在需要构建该行的大型列表:

def select_volumes(group_c,group_k):
    Select_list = ["Amygdala", "Hippocampus", "Lateral-Ventricle",
"Pallidum", "Putamen", "Thalamus"]
    Side = ["Left", "Right"]
    data = []
    for s in Side:
        for struct in Select_list:
            volumes_c = group_c.filter(regex="^(?=.*"+s+")(?=.*"+struct+")
   (?=.*Volume)")
            volumes_k = group_k.filter(regex="^(?=.*"+s+")(?=.*"+struct+")
   (?=.*Volume)")
            k = cohens_d(volumes_c, volumes_k)
            meand = volumes_c.mean()

            # build a row of result df
            data.append(k.reset_index().iloc[0].tolist() + [meand.iloc[0]])

    # after the loop combine the rows into a dataframe and return it:
    result = pd.DataFrame(data, columns=['index', "Cohen's d", 'Mean']).set_index('index')
    return result

关于python - 如何用列迭代填充Pandas Dataframe,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/56235427/

10-09 17:08