本文介绍了如何搜索一列,然后用发现的内容填充另一列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有虚构persondata的大熊猫数据框.下面是一个小例子-每个人都用数字定义.

I have a big pandas Dataframe with fictional persondata. The below is a small example - each person is defined by a number.

import pandas as pd
import numpy as np
df = pd.DataFrame({ 'Number':["5569", "3385", "9832", "6457", "5346", "5462", "9873", "2366"] , 'Gender': ['Male', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male', 'Female'], 'Children': [np.nan, "5569 6457", "5569", np.nan, "6457", "2366", "2366", np.nan]})

df
   Number  Gender   Children
0    5569    Male        NaN
1    3385    Male  5569 6457
2    9832  Female       5569
3    6457    Male        NaN
4    5346  Female       6457
5    5462  Female       2366
6    9873    Male       2366
7    2366  Female        NaN

有些人是其他一些人的孩子.现在,我要创建两列母亲"和父亲",并用相关的数字填充它们.通过查看儿童"列,然后将某人添加为父亲(如果他们是男性,并且在儿童"中拥有孩子的数量,而女性与母亲的数量相同),可以得到这些信息.但是,有些值是NaN,有些人有多个孩子(在实际数据集中,他们可以有4个以上的孩子).

Some of the people are the children of some of the others.Now I want to make two columns "Mother" and "Father" and fill them with the relevant numbers. I would get those by looking at the "Children" column and then adding someone as the father if they are a male and has the number of the child in "Children" and the same for females as mothers. However, some of the values are NaN and some people have multiple children (they can have more than 4 in the actual dataset).

我一直在尝试使用.isin和类似的东西,但是我根本无法使其工作.

I've been trying with .isin and similar but I simply can't get it to work.

他们期望此示例的输出如下所示:

They expected output for this example would look like this:

df = pd.DataFrame({ 'Number':["5569", "3385", "9832", "6457", "5346", "5462", "9873", "2366"] , 'Gender': ['Male', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male', 'Female'], 'Children': [np.nan, "5569 6457", "5569", np.nan, "6457", "2366", "2366", np.nan], 'Mother':[9832, np.nan, np.nan,"5346", np.nan, np.nan, np.nan, "5462"], 'Father':["3385", np.nan, np.nan, "3385", np.nan, np.nan, np.nan, "9873"]})

df
  Number  Gender   Children Mother Father
0   5569    Male        NaN   9832   3385
1   3385    Male  5569 6457    NaN    NaN
2   9832  Female       5569    NaN    NaN
3   6457    Male        NaN   5346   3385
4   5346  Female       6457    NaN    NaN
5   5462  Female       2366    NaN    NaN
6   9873    Male       2366    NaN    NaN
7   2366  Female        NaN   5462   9873

推荐答案

这对我来说很好(只有2行:D)

This looks good for me (Only 2 lines :D )

注意:带空格的字符串=>我忽略了空格,并做了一个大数字

Note: the string with the space => I ignored the space and made a large number

df['MotherNumber'] =  np.where(pd.notna(df['Children'].str.strip()) & (df['Gender'] == 'Female'),  float('nan'), df['Mother'])
df['FatherNumber'] =  np.where(pd.notna(df['Children'].str.strip()) & (df['Gender'] == 'Male'),  float('nan'), df['Father'])


print(df)
  Number  Gender   Children Mother Father MotherNumber FatherNumber
0   5569    Male        NaN   9832   3385         9832         3385
1   3385    Male  5569 6457    NaN    NaN          NaN          NaN
2   9832  Female       5569    NaN    NaN          NaN          NaN
3   6457    Male        NaN   5346   3385         5346         3385
4   5346  Female       6457    NaN    NaN          NaN          NaN
5   5462  Female       2366    NaN    NaN          NaN          NaN
6   9873    Male       2366    NaN    NaN          NaN          NaN
7   2366  Female        NaN   5462   9873         5462         9873

这篇关于如何搜索一列,然后用发现的内容填充另一列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-18 18:55