如何查找在另一列的不同行中有多个值的列值的总长度

如何查找在另一列的不同行中有多个值的列值的总长度

本文介绍了如何查找在另一列的不同行中有多个值的列值的总长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法找到同时有Apple和Strawberry的ID,然后求总长?而ID只有苹果,IDS只有草莓?

Is there a way to find IDs that have both Apple and Strawberry, and then find the total length? and IDs that has only Apple, and IDS that has only Strawberry?

df:

        ID           Fruit
0       ABC          Apple        <-ABC has Apple and Strawberry
1       ABC          Strawberry   <-ABC has Apple and Strawberry
2       EFG          Apple        <-EFG has Apple only
3       XYZ          Apple        <-XYZ has Apple and Strawberry
4       XYZ          Strawberry   <-XYZ has Apple and Strawberry
5       CDF          Strawberry   <-CDF has Strawberry
6       AAA          Apple        <-AAA has Apple only

所需的输出:

Length of IDs that has Apple and Strawberry: 2
Length of IDs that has Apple only: 2
Length of IDs that has Strawberry: 1

谢谢!

推荐答案

如果Fruit列中的所有值总是只有AppleStrawberry您可以比较每组的集合,然后通过 True 的值的 sum 计算 ID:

If always all values are only Apple or Strawberry in column Fruit you can compare sets per groups and then count ID by sum of Trues values:

v = ['Apple','Strawberry']
out = df.groupby('ID')['Fruit'].apply(lambda x: set(x) == set(v)).sum()
print (out)
2

如果有很多值:

s = df.groupby('ID')['Fruit'].agg(frozenset).value_counts()
print (s)
{Apple}                2
{Strawberry, Apple}    2
{Strawberry}           1
Name: Fruit, dtype: int64

这篇关于如何查找在另一列的不同行中有多个值的列值的总长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-18 19:51