问题描述
所以这是一个常见问题,但是我找不到适合这种特殊情况的答案.
So this is a common question but I cant find an answer that fits this particular scenario.
因此,我有一个 Dataframe
,其中包含用于类型的列,例如"Drama,Western",以及一种热门编码的流派,因此对于戏剧和西方流派,在这两列中均 1 ,但对于该列,其西方流派仅为其 1 0用于戏剧.
So I have a Dataframe
with columns for genres eg "Drama, Western" and one hot encoded versions of the genres so for the drama and western there is a 1 in both columns but where its just Western genre its 1 for that column 0 for drama.
我想要一个过滤后的数据框,该数据框仅包含Western而不包含其他类型的行.我试图对模型进行过采样,因为它是次要类别,但我不想增加其他流派计数作为副产品
I want a filtered dataframe containing rows with only Western and no other genre. Im trying to oversample for a model as it is a minor class but I don't want to increase other genre counts as a byproduct
有多行,所以我不能使用索引,并且有多种类型,所以我不能使用类似 df [(df ['Western'] == 1)&的条件.(df ['Drama'] == 0)
,而无需考虑 24 流派.
There are multiple rows so I can't use the index and there are multiple genres so I can't use a condition like df[(df['Western']==1) & (df['Drama']==0)
without having to account for 24 genres.
Index | Genre | Drama | Western | Action | genre 4 |
0 Drama, Western 1 1 0 0
1 Western 0 1 0 0
3 Action, Western 0 1 1 0
推荐答案
如果我正确理解了您的问题,则希望只有'Western'为1的那些行,即类型仅是Western,别无其他.
If I understand your question correctly, you want those rows where only 'Western' is 1, i.e. the genre is only Western, nothing else.
那为什么为什么要使用编码列呢?只需使用原始的流派"列,其中数据为字符串格式.无需太复杂.
Why do you have to use the encoded columns then? Just use the original 'Genre' column where the data is in string format. No need to overcomplicate things.
new_df = df[df['Genre']=='Western']
这篇关于根据条件在 pandas 中定位行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!