我有一个具有某些特征的类别(身高和体重,由np.where定义),而另一个具有其他特征的类别(如果某人是双胞胎,有多少同胞,由np.where定义)。我想看看有多少人同时属于这两个类别(例如如果制作了维恩图,那么会有多少人处于中心?)。
我正在导入CSV文件的列。
该表如下所示:
Child Inches Weight Twin Siblings
0 A 53 100 Y 3
1 B 54 110 N 4
2 C 56 120 Y 2
3 D 58 165 Y 1
4 E 60 150 N 1
5 F 62 160 N 1
6 H 65 165 N 3
import pandas as pd
import numpy as np
file = pd.read_csv(r'~/Downloads/Test3 CVS_Sheet1.csv')
#%%
height = file["Inches"]
weight = file["Weight"]
twin = file["Twin"]
siblings = file["Siblings"]
#%%
area1 = np.where((height <= 60) & (weight <= 150))[0]
#%%
#has two or more siblings (and is a twin)
group_a = np.where((siblings >= 2) & (twin == 'Y'))[0]
#has two or more siblings (and is not a twin)
group_b = np.where((siblings >= 2) & (twin == 'N'))[0]
#has only one sibling (and is twin)
group_c = np.where((siblings == 1) & (twin == 'Y'))[0]
#has only one sibling (and is not a twin)
group_d = np.where((siblings == 1) & (twin == 'N'))[0]
#%%
for i in area1:
if group_a==True:
print("in area1 there are", len(i), "children in group_a")
elif group_b==True:
print("in area1 there are", len(i), "children in group_b")
elif group_c==True:
print("in area1 there are", len(i), "children in group_c")
elif group_d==True:
print("in area1 there are", len(i), "children in group_d")
我收到错误消息:“ ValueError:具有多个元素的数组的真值不明确。请使用a.any()或a.all()”
我希望输出如下:
"in area1 there are 2 children in group_a"
"in area1 there are 1 children in group_b"
"in area1 there are 0 children in group_c"
"in area1 there are 1 children in group_d"
提前致谢!
最佳答案
在您的示例中,我将采用稍微不同的设计。你可以这样做:
df['area1'] = np.where((df.Inches <= 60) & (df.Weight <= 150),1,0)
df['group_a'] = np.where((df.Siblings >= 2) & (df.Twin == 'Y'),1,0)
df['group_b'] = np.where((df.Siblings >= 2) & (df.Twin == 'N'),1,0)
df['group_c'] = np.where((df.Siblings == 1) & (df.Twin == 'Y'),1,0)
df['group_d'] = np.where((df.Siblings == 1) & (df.Twin == 'N'),1,0)
结果将如下所示:
从这一点开始,您可以构建查询,以便查看group_b:
df.groupby(['area1'])['group_b'].sum()[1]
您将获得所需的结果:1.您可以使用总和或计数来调整表格。
最后:
for col in df.columns[6:]:
r = df.groupby(['area1'])[col].sum()[1]
print ("in area1 there are",r,'children in',col)
将产生:
in area1 there are 2 children in group_a
in area1 there are 1 children in group_b
in area1 there are 0 children in group_c
in area1 there are 1 children in group_d
关于python - 想知道两个不同子集的重叠中有多少个对象,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/57065716/