问题描述
我需要将矩阵中的值与阈值进行比较,并创建一个表,该表不仅具有索引,而且还具有值超过阈值的列名.
I need to compare values in a matrix with a threshold and create a table with not only indexes but also with column name where a value exceeds the threshold.
例如.
原始表:
需要创建一个超过阈值的 Id_Class 列表,但是,在将它们发送到列表之前,我想要一个中间的二进制矩阵.
need to create a list of Id_Class that exceed the threshold,however I would like to have an intermediary binary matrix before to send them to the list.
像这样
和最终名单
我已经尝试使用代码创建二进制矩阵,但是它不起作用.
I've tried the code to create a binary matrix but it doesn't work.
import pandas as pd
df = pd.DataFrame({'id':[1,2,3],
'region':['a','b','c'],
'threshold':[0.4, 0.5, 0.3],
'class_1':[0.2, 0.3, 0.3],
'class_2':[0.6, 0.2, 0.1],
'class_3':[0.4, 0.6, 0.1]})
df1 = df.set_index(['id', 'region', 'threshold'])
df1=df1.where(df1 >=df['threshold'] , 1, 0).reset_index()
感谢您的帮助
推荐答案
将具有广播和布尔掩码的numpy数组转换为整数:
Compare numpy arrays with broadcasting and boolean mask convert to integers:
df.iloc[:, 3:] = (df.iloc[:, 3:].values >= df['threshold'].values[:, None]).astype(int)
print (df)
id region threshold class_1 class_2 class_3
0 1 a 0.4 0 1 1
1 2 b 0.5 0 0 1
2 3 c 0.3 1 0 0
另一种解决方案:
arr = (df.iloc[:, 3:].values >= df['threshold'].values[:, None]).astype(int)
print (arr)
[[0 1 1]
[0 0 1]
[1 0 0]]
df = df.iloc[:, :3].join(pd.DataFrame(arr, columns=df.columns[3:], index=df.index))
print (df)
id region threshold class_1 class_2 class_3
0 1 a 0.4 0 1 1
1 2 b 0.5 0 0 1
2 3 c 0.3 1 0 0
对于具有1
值的列,请使用 DataFrame.stack
重塑:
For column with 1
values use DataFrame.stack
for reshape:
df2 = (df.set_index('id')
.iloc[:, 2:]
.stack()
.rename_axis(('id','class'))
.reset_index(name='a')
.query('a == 1')
.drop('a', 1))
print (df2)
id class
1 1 class_2
2 1 class_3
5 2 class_3
6 3 class_1
这篇关于将矩阵中的值与阈值进行比较,并创建一个超过阈值的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!