DataFrame中的成对的列的减法

DataFrame中的成对的列的减法

本文介绍了计算pandas DataFrame中的成对的列的减法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用尺寸很大(48K行,最多数十列)的DataFrame.在操作它们的某个时刻,我需要对列值进行成对减法,我想知道是否有比我正在做的方法更有效的方法(见下文).

I work with significantly sized (48K rows, up to tens of columns) DataFrames. At a certain point in their manipulation, I need to do pair-wise subtractions of column values and I was wondering if there is a more efficient way to do so rather than the one I'm doing (see below).

我当前的代码:

 # Matrix is the pandas DataFrame containing all the data
 comparison_df = pandas.DataFrame(index=matrix.index)
 combinations = itertools.product(group1, group2)

 for observed, reference in combinations:

     observed_data = matrix[observed]
     reference_data = matrix[reference]

     comparison = observed_data - reference_data
     name = observed + "_" + reference
     comparison_df[name] = comparison

由于数据可能很大(我也在置换测试中也在使用这段代码),所以我想知道是否可以对其进行一些优化.

Since the data can be large (I'm using this piece of code also during a permutation test), I'm interested in knowing if it can be optimized a bit.

根据要求,以下是典型数据集的示例

As requested, here's a sample of a typical data set

ID                    A1      A2       A3       B1       B2       B3
Ku8QhfS0n_hIOABXuE    6.343   6.304    6.410    6.287    6.403    6.279
fqPEquJRRlSVSfL.8A    6.752   6.681    6.680    6.677    6.525    6.739
ckiehnugOno9d7vf1Q    6.297   6.248    6.524    6.382    6.316    6.453
x57Vw5B5Fbt5JUnQkI    6.268   6.451    6.379    6.371    6.458    6.333

典型的结果是,如果"A"组是 group1 和"B" group2 ,则对于每个ID行,每列都有一对与上面生成的配对相对应的(例如,A1_B1,A2_B1,A3_B1 ...),其中包含每个行ID的减法.

And a typical result would be, if the "A" group is group1 and "B" group2, for each ID row, to have for each column a pair (e.g., A1_B1, A2_B1, A3_B1...) corresponding to the pairings generated above, containing the subtraction for each row ID.

推荐答案

使用 itertools.combinations() DataFrame 列上

您可以使用 itertools.combinations创建列的组合() 并根据这些对评估减法以及新名称:

Using itertools.combinations() on DataFrame columns

You can create combinations of columns with itertools.combinations() and evaluate subtractions along with new names based on these pairs:

import pandas as pd
from cStringIO import StringIO
import itertools as iter

matrix = pd.read_csv(StringIO('''ID,A1,A2,A3,B1,B2,B3
Ku8QhfS0n_hIOABXuE,6.343,6.304,6.410,6.287,6.403,6.279
fqPEquJRRlSVSfL.8A,6.752,6.681,6.680,6.677,6.525,6.739
ckiehnugOno9d7vf1Q,6.297,6.248,6.524,6.382,6.316,6.453
x57Vw5B5Fbt5JUnQkI,6.268,6.451,6.379,6.371,6.458,6.333''')).set_index('ID')

print 'Original DataFrame:'
print matrix
print

# Create DataFrame to fill with combinations
comparison_df = pd.DataFrame(index=matrix.index)
# Create combinations of columns
for a, b in iter.combinations(matrix.columns, 2):
    # Subtract column combinations
    comparison_df['{}_{}'.format(a, b)] = matrix[a] - matrix[b]

print 'Combination DataFrame:'
print comparison_df


Original DataFrame:
                       A1     A2     A3     B1     B2     B3
ID
Ku8QhfS0n_hIOABXuE  6.343  6.304  6.410  6.287  6.403  6.279
fqPEquJRRlSVSfL.8A  6.752  6.681  6.680  6.677  6.525  6.739
ckiehnugOno9d7vf1Q  6.297  6.248  6.524  6.382  6.316  6.453
x57Vw5B5Fbt5JUnQkI  6.268  6.451  6.379  6.371  6.458  6.333

Combination DataFrame:
                    A1_A2  A1_A3  A1_B1  A1_B2  A1_B3  A2_A3  A2_B1  A2_B2  A2_B3  A3_B1  A3_B2  A3_B3  B1_B2  B1_B3  B2_B3
ID
Ku8QhfS0n_hIOABXuE  0.039 -0.067  0.056 -0.060  0.064 -0.106  0.017 -0.099  0.025  0.123  0.007  0.131 -0.116  0.008  0.124
fqPEquJRRlSVSfL.8A  0.071  0.072  0.075  0.227  0.013  0.001  0.004  0.156 -0.058  0.003  0.155 -0.059  0.152 -0.062 -0.214
ckiehnugOno9d7vf1Q  0.049 -0.227 -0.085 -0.019 -0.156 -0.276 -0.134 -0.068 -0.205  0.142  0.208  0.071  0.066 -0.071 -0.137
x57Vw5B5Fbt5JUnQkI -0.183 -0.111 -0.103 -0.190 -0.065  0.072  0.080 -0.007  0.118  0.008 -0.079  0.046 -0.087  0.038  0.125

这篇关于计算pandas DataFrame中的成对的列的减法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 15:27