问题描述
从另一个我的问题开始,我昨天做了
starting by another my question I've done yesterday Pandas set value if all columns are equal in a dataframe
从@ anky_91解决方案开始,我正在研究类似的东西.如果所有列均相等,则不要放置1
或-1
,我想要更灵活的东西.实际上,如果(例如)列的70%百分比是1
,对于相同但相反的条件,-1
,而对于0
,则我想要1
.
Starting by @anky_91 solution I'm working on something similar.Instead of put 1
or -1
if all columns are equals I want something more flexible. In fact I want 1
if (for example) the 70% percentage of the columns are 1
, -1
for the same but inverse condition and 0
else.
这就是我写的:
# Instead of using .all I use .sum to count the occurence of 1 and 0 for each row
m1 = local_df.eq(1).sum(axis=1)
m2 = local_df.eq(0).sum(axis=1)
# Debug print, it work
print(m1)
print(m2)
但是我不知道如何更改此部分:
But I don't know how to change this part:
local_df['enseamble'] = np.select([m1, m2], [1, -1], 0)
m = local_df.drop(local_df.columns.difference(['enseamble']), axis=1)
我用伪代码写我想要的东西:
I write in pseudo code what I want:
tot = m1 + m2
if m1 > m2
if(m1 * 100) / tot > 0.7 # simple percentage calculus
df['enseamble'] = 1
else if m2 > m1
if(m2 * 100) / tot > 0.7 # simple percentage calculus
df['enseamble'] = -1
else:
df['enseamble'] = 0
谢谢
这是预期输出的示例:
NET_0 NET_1 NET_2 NET_3 NET_4 NET_5 NET_6
date
2009-08-02 0 1 1 1 0 1
2009-08-03 1 0 0 0 1 0
2009-08-04 1 1 1 0 0 0
date enseamble
2009-08-02 1 # because 1 is more than 70%
2009-08-03 -1 # because 0 is more than 70%
2009-08-04 0 # because 0 and 1 are 50-50
推荐答案
您可以从以下条件中获取指定的输出:
You could obtain the specified output from the following conditions:
thr = 0.7
c1 = (df.eq(1).sum(1)/df.shape[1]).gt(thr)
c2 = (df.eq(0).sum(1)/df.shape[1]).gt(thr)
c2.astype(int).mul(-1).add(c1)
输出
2009-08-02 0
2009-08-03 0
2009-08-04 0
2009-08-05 0
2009-08-06 -1
2009-08-07 1
dtype: int64
或使用np.select
:
pd.DataFrame(np.select([c1,c2], [1,-1], 0), index=df.index, columns=['result'])
result
2009-08-02 0
2009-08-03 0
2009-08-04 0
2009-08-05 0
2009-08-06 -1
2009-08-07 1
这篇关于如果数据框中的大多数列相等,则 pandas 设置值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!