问题描述
我有一个 df
,其中 user_id
和 category
。我想将其转换为真值表,以了解该用户是否对该类别至少有一个条目。但是,最终表中还应包括 df_list中出现的所有类别的列,而这些列可能根本不会出现在 df
中。
I have one df
with a user_id
and a category
. I'd like to transform this to a truth table for whether or not that user has at least one entry for that category. However, the final table should also include columns for all categories that appear in 'df_list', which may not appear at all in df
.
现在,我用 groupby
+ size $ c创建真值表$ c>,然后检查是否缺少任何列,然后将这些列手动设置为
False
,但是我想知道是否有一种方法可以在初始 groupby
步骤。
Right now I create the truth table with a groupby
+ size
and then check if any columns are missing, and then manually set those columns to False
, but I was wondering if there was a way to accomplish this in the initial groupby
step.
以下是示例:
import pandas as pd
df = pd.DataFrame({'user_id': [1,1,1,2,2],
'category': ['A', 'B', 'D', 'A', 'F']})
df_list = pd.DataFrame({'category': ['A', 'B', 'C', 'D', 'E', 'F']})
df_truth = df.groupby(['user_id', 'category']).size().unstack(fill_value=0).astype(bool)
#category A B D F
#user_id
#1 True True True False
#2 True False False True
然后要获得所需的输出,我会这样做:
To then get to the desired output I then do:
missing_vals = df_list.category.unique()[~pd.Series(df_list.category.unique()).isin(df_truth.columns)]
for element in missing_vals:
df_truth.loc[:,element] = False
#category A B D F C E
#user_id
#1 True True True False False False
#2 True False False True False False
推荐答案
选项1
交叉表
我建议将该列转换为dtype。 crosstab
/ pivot
然后将处理其余部分。
Option 1crosstab
I'd recommend converting that column to a categorical dtype. crosstab
/pivot
will then handle the rest.
i = df.user_id
j = pd.Categorical(df.category, categories=df_list.category)
pd.crosstab(i, j).astype(bool)
col_0 A B C D E F
user_id
1 True True False True False False
2 True False False False False True
选项2
unstack
+ reindex
要修复现有代码,可以使用<$ c $简化第二步c> reindex :
(df.groupby(['user_id', 'category'])
.size()
.unstack(fill_value=0)
.reindex(df_list.category, axis=1, fill_value=0)
.astype(bool)
)
category A B C D E F
user_id
1 True True False True False False
2 True False False False False True
这篇关于如何将一个DataFrame列与基于另一个DataFrame的列一起旋转到真值表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!