问题描述
我有一个带有虚构persondata的大熊猫数据框.下面是一个小例子-每个人都用数字定义.
I have a big pandas Dataframe with fictional persondata. The below is a small example - each person is defined by a number.
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'Number':["5569", "3385", "9832", "6457", "5346", "5462", "9873", "2366"] , 'Gender': ['Male', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male', 'Female'], 'Children': [np.nan, "5569 6457", "5569", np.nan, "6457", "2366", "2366", np.nan]})
df
Number Gender Children
0 5569 Male NaN
1 3385 Male 5569 6457
2 9832 Female 5569
3 6457 Male NaN
4 5346 Female 6457
5 5462 Female 2366
6 9873 Male 2366
7 2366 Female NaN
有些人是其他一些人的孩子.现在,我要创建两列母亲"和父亲",并用相关的数字填充它们.通过查看儿童"列,然后将某人添加为父亲(如果他们是男性,并且在儿童"中拥有孩子的数量,而女性与母亲的数量相同),可以得到这些信息.但是,有些值是NaN,有些人有多个孩子(在实际数据集中,他们可以有4个以上的孩子).
Some of the people are the children of some of the others.Now I want to make two columns "Mother" and "Father" and fill them with the relevant numbers. I would get those by looking at the "Children" column and then adding someone as the father if they are a male and has the number of the child in "Children" and the same for females as mothers. However, some of the values are NaN and some people have multiple children (they can have more than 4 in the actual dataset).
我一直在尝试使用.isin和类似的东西,但是我根本无法使其工作.
I've been trying with .isin and similar but I simply can't get it to work.
他们期望此示例的输出如下所示:
They expected output for this example would look like this:
df = pd.DataFrame({ 'Number':["5569", "3385", "9832", "6457", "5346", "5462", "9873", "2366"] , 'Gender': ['Male', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male', 'Female'], 'Children': [np.nan, "5569 6457", "5569", np.nan, "6457", "2366", "2366", np.nan], 'Mother':[9832, np.nan, np.nan,"5346", np.nan, np.nan, np.nan, "5462"], 'Father':["3385", np.nan, np.nan, "3385", np.nan, np.nan, np.nan, "9873"]})
df
Number Gender Children Mother Father
0 5569 Male NaN 9832 3385
1 3385 Male 5569 6457 NaN NaN
2 9832 Female 5569 NaN NaN
3 6457 Male NaN 5346 3385
4 5346 Female 6457 NaN NaN
5 5462 Female 2366 NaN NaN
6 9873 Male 2366 NaN NaN
7 2366 Female NaN 5462 9873
推荐答案
这对我来说很好(只有2行:D)
This looks good for me (Only 2 lines :D )
注意:带空格的字符串=>我忽略了空格,并做了一个大数字
Note: the string with the space => I ignored the space and made a large number
df['MotherNumber'] = np.where(pd.notna(df['Children'].str.strip()) & (df['Gender'] == 'Female'), float('nan'), df['Mother'])
df['FatherNumber'] = np.where(pd.notna(df['Children'].str.strip()) & (df['Gender'] == 'Male'), float('nan'), df['Father'])
print(df)
Number Gender Children Mother Father MotherNumber FatherNumber
0 5569 Male NaN 9832 3385 9832 3385
1 3385 Male 5569 6457 NaN NaN NaN NaN
2 9832 Female 5569 NaN NaN NaN NaN
3 6457 Male NaN 5346 3385 5346 3385
4 5346 Female 6457 NaN NaN NaN NaN
5 5462 Female 2366 NaN NaN NaN NaN
6 9873 Male 2366 NaN NaN NaN NaN
7 2366 Female NaN 5462 9873 5462 9873
这篇关于如何搜索一列,然后用发现的内容填充另一列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!