如果数据框中的大多数列相等，则 pandas 设置值

本文介绍了如果数据框中的大多数列相等，则 pandas 设置值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

从另一个我的问题开始，我昨天做了

starting by another my question I've done yesterday Pandas set value if all columns are equal in a dataframe

从@ anky_91解决方案开始，我正在研究类似的东西.如果所有列均相等，则不要放置1或-1，我想要更灵活的东西.实际上，如果(例如)列的70％百分比是1，对于相同但相反的条件，-1，而对于0，则我想要1.

Starting by @anky_91 solution I'm working on something similar.Instead of put 1 or -1 if all columns are equals I want something more flexible. In fact I want 1 if (for example) the 70% percentage of the columns are 1, -1 for the same but inverse condition and 0 else.

这就是我写的:

# Instead of using .all I use .sum to count the occurence of 1 and 0 for each row
m1 = local_df.eq(1).sum(axis=1)
m2 = local_df.eq(0).sum(axis=1)

# Debug print, it work
print(m1)
print(m2)

但是我不知道如何更改此部分:

But I don't know how to change this part:

local_df['enseamble'] = np.select([m1, m2], [1, -1], 0)
m = local_df.drop(local_df.columns.difference(['enseamble']), axis=1)

我用伪代码写我想要的东西:

I write in pseudo code what I want:

tot = m1 + m2

if m1 > m2
    if(m1 * 100) / tot > 0.7 # simple percentage calculus
      df['enseamble'] = 1

else if m2 > m1
    if(m2 * 100) / tot > 0.7 # simple percentage calculus
      df['enseamble'] = -1   

else: 
   df['enseamble'] = 0

谢谢

这是预期输出的示例:

 NET_0  NET_1  NET_2  NET_3  NET_4  NET_5  NET_6   
date                                                                                                                                                                                                            
2009-08-02      0     1    1    1    0    1
2009-08-03      1     0    0    0    1    0
2009-08-04      1     1    1    0    0    0


 date    enseamble
 2009-08-02     1 # because 1 is more than 70%
 2009-08-03     -1 # because 0 is more than 70%
 2009-08-04     0 # because 0 and 1 are 50-50

推荐答案

您可以从以下条件中获取指定的输出:

You could obtain the specified output from the following conditions:

thr = 0.7
c1 = (df.eq(1).sum(1)/df.shape[1]).gt(thr)
c2 = (df.eq(0).sum(1)/df.shape[1]).gt(thr)
c2.astype(int).mul(-1).add(c1)

输出

2009-08-02    0
2009-08-03    0
2009-08-04    0
2009-08-05    0
2009-08-06   -1
2009-08-07    1
dtype: int64

或使用np.select:

pd.DataFrame(np.select([c1,c2], [1,-1], 0), index=df.index, columns=['result'])

              result
2009-08-02       0
2009-08-03       0
2009-08-04       0
2009-08-05       0
2009-08-06      -1
2009-08-07       1

这篇关于如果数据框中的大多数列相等，则 pandas 设置值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！