数据帧中逐行计算T检验

数据帧中逐行计算T检验

本文介绍了如何从两个 pandas 数据帧中逐行计算T检验的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下两个数据帧:

import pandas as pd
import scipy.stats
import numpy as np
df_a = pd.DataFrame({
               's1': [10,10,12,13,14,15],
               's2': [100,100,112,1.3,14,125],
               's2': [13,200,10,13,14.5,10.5],
               'gene_symbol': ['a', 'b', 'c', 'd', 'e', 'f'],
               })

df_b = pd.DataFrame({
               's1': [15,20,123,13,14,15,1],
               's2': [130,100,72,1.3,14,125,2],
               's2': [213,200,35.4,13,414.5,130.5,3],
               'gene_symbol': ['a', 'b', 'c', 'd', 'e', 'f','g'],
               })

df_a.set_index('gene_symbol', inplace=True)
df_b.set_index('gene_symbol', inplace=True)

看起来像这样:

             s1     s2
gene_symbol
a            10   13.0
b            10  200.0
c            12   10.0
d            13   13.0
e            14   14.5
f            15   10.5

In [51]: df_b
Out[51]:
              s1     s2
gene_symbol
a             15  213.0
b             20  200.0
c            123   35.4
d             13   13.0
e             14  414.5
f             15  130.5
g              1    3.0

我要做的是逐个基因计算T检验p值基因.例如,对于基因a,我们将拥有

What I want to do is to calculate T-test p-value gene by gene.For example for gene a we will have

In [47]: scipy.stats.ttest_ind([ 10,13.0],[15,213.0])
Out[47]: Ttest_indResult(statistic=-1.0352347135782713, pvalue=0.4093249100598676)

我如何将其应用于所有共享两个数据帧共有基因的行(例如,忽略df_b中的基因g).

How can I apply that for all rows that shares common genes for two data frames (e.g. ignore gene g in df_b).

我尝试过,但是失败了:

I tried this but it failed:

scipy.stats.ttest_ind(df_a, df_b,axis=1)

推荐答案

您可以通过匹配两个数据框或索引来使用gene_symbol索引来删除g行.

You can remove g row using your gene_symbol index by matching two dataframes, or indexes.

您可以使用 pandas. merge()在匹配的列或索引上连接两个DataFrame,并在ttest_ind上使用合并的DataFrame的列:

You can use pandas.merge() to join two DataFrames on matching columns or indexes, and use the columns of the merged DataFrame on ttest_ind:

# default join is inner
df_m = pd.merge(df_a, df_b, left_index=True, right_index=True)
scipy.stats.ttest_ind(df_m.ix[:, :2], df_m.ix[:, 2:], axis=1)

或者您可以找到交叉点索引,并使用它们来切片数据集:

Or you can find the intersection of the indexes and use them to slice your datasets:

idx = df_a.index.intersection(df_b.index)
scipy.stats.ttest_ind(df_a.loc[idx], df_b.loc[idx], axis=1)

这篇关于如何从两个 pandas 数据帧中逐行计算T检验的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 03:12