本文介绍了如何查找在另一列的不同行中有多个值的列值的总长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
有没有办法找到同时有Apple和Strawberry的ID,然后求总长?而ID只有苹果,IDS只有草莓?
Is there a way to find IDs that have both Apple and Strawberry, and then find the total length? and IDs that has only Apple, and IDS that has only Strawberry?
df:
ID Fruit
0 ABC Apple <-ABC has Apple and Strawberry
1 ABC Strawberry <-ABC has Apple and Strawberry
2 EFG Apple <-EFG has Apple only
3 XYZ Apple <-XYZ has Apple and Strawberry
4 XYZ Strawberry <-XYZ has Apple and Strawberry
5 CDF Strawberry <-CDF has Strawberry
6 AAA Apple <-AAA has Apple only
所需的输出:
Length of IDs that has Apple and Strawberry: 2
Length of IDs that has Apple only: 2
Length of IDs that has Strawberry: 1
谢谢!
推荐答案
如果Fruit
列中的所有值总是只有Apple
或Strawberry
您可以比较每组的集合,然后通过 True
的值的 sum
计算 ID
:
If always all values are only Apple
or Strawberry
in column Fruit
you can compare sets per groups and then count ID
by sum
of True
s values:
v = ['Apple','Strawberry']
out = df.groupby('ID')['Fruit'].apply(lambda x: set(x) == set(v)).sum()
print (out)
2
如果有很多值:
s = df.groupby('ID')['Fruit'].agg(frozenset).value_counts()
print (s)
{Apple} 2
{Strawberry, Apple} 2
{Strawberry} 1
Name: Fruit, dtype: int64
这篇关于如何查找在另一列的不同行中有多个值的列值的总长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!