从 pandas 的行中减去组特定值 | 的行中减去组特定值

本文介绍了从 pandas 的行中减去组特定值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 Pandas 中，我有一个由两组组成的数据框，每组中有几个样本.每个组都有一个内部参考值，我想从该组内的所有样本值中减去该值.

In Pandas I have a data frame consisting of two groups with several samples in each group. Each group has an internal reference value that I want to subtract from all the sample values within that group.

s = u"""Group    sample    value
group1    ref1    18.1
group1    smp1    NaN
group1    smp2    20.3
group1    smp3    30.0
group2    ref2    16.1
group2    smp4    29.2
group2    smp5    19.9
group2    smp6    28.9
"""
df = pd.read_csv(io.StringIO(s), sep='\s+')
df = df.set_index(['Group', 'sample'])
df

Out[82]:

                 value
Group    sample
group1   ref1    18.1
         smp1    NaN
         smp2    20.3
         smp3    30.0
group2   ref2    16.1
         smp4    29.2
         smp5    19.9
         smp6    28.9

我想做的是添加一个新列，其中从每个相应组内的所有样本 (smp) 中减去参考 (ref).像这样:

What I want do do is to add a new column where the reference (ref) has been subtracted from all samples (smp) within each respective group. Like this:

                   value   deltaValue
SampleGroup   sample
Group1        ref      18.1    0
              smp1     NaN     NaN
              smp2     20.3    2.2
              smp3     30.0    11.9
Group2        ref2     16.1    0
              smp4     29.2    13.1
              smp5     19.9    3.8
              smp6     28.9    12.8

有谁知道如何做到这一点?谢谢！

Does anyone know how this can be done? Thanks!

推荐答案

这是一种无需循环的方法

Here's one way to do it without loops

首先创建一个 func 函数，该函数标识以 ref 开头的 sample，然后计算 delta 值.

First create a func function which identifies sample which starts with ref and then calculates delta value.

In [33]: def func(grp):
    ref = grp.ix[grp['sample'].str.startswith('ref'), 'value']
    grp['delta'] = grp['value'] - ref.values[0]
    return grp

使用这个 func 并应用到 dff.groupby('Group')

Use this func and apply over the the dff.groupby('Group')

In [34]: dff.groupby('Group').apply(func)
Out[34]:
    Group sample  value  delta
0  group1   ref1   18.1    0.0
1  group1   smp1    NaN    NaN
2  group1   smp2   20.3    2.2
3  group1   smp3   30.0   11.9
4  group2   ref2   16.1    0.0
5  group2   smp4   29.2   13.1
6  group2   smp5   19.9    3.8
7  group2   smp6   28.9   12.8

首先你的 dff 应该是这样的，它可以像 dff = df.reset_index()

To begin with your dff should be like, which could be created like dff = df.reset_index()

In [35]: dff
Out[35]:
    Group sample  value
0  group1   ref1   18.1
1  group1   smp1    NaN
2  group1   smp2   20.3
3  group1   smp3   30.0
4  group2   ref2   16.1
5  group2   smp4   29.2
6  group2   smp5   19.9
7  group2   smp6   28.9

这篇关于从 pandas 的行中减去组特定值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！