在pandas dataframe字符串列中,我想基于一行的值派生一个新列,直到下一个值再次出现。什么是最有效的方法?
输入数据框:
import pandas as pd
df = pd.DataFrame({'neighborhood':['Chicago City', 'Wicker Park', 'Bucktown','Lincoln Park','West Loop','River North','Milwaukee City','Bay View','East Side','South Side','Bronzeville','North Side','New York City','Harlem','Midtown','Chinatown']})
我想要的数据帧输出将是:
neighborhood city
0 Chicago City Chicago
1 Wicker Park Chicago
2 Bucktown Chicago
3 Lincoln Park Chicago
4 West Loop Chicago
5 River North Chicago
6 Milwaukee City Milwaukee
7 Bay View Milwaukee
8 East Side Milwaukee
9 South Side Milwaukee
10 Bronzeville Milwaukee
11 North Side Milwaukee
12 New York City New York
13 Harlem New York
14 Midtown New York
15 Chinatown New York
最佳答案
1)如果第一列包含“城市”,则将其复制到第二列,但剪切掉“城市”部分
2)用正向填充方法填充NA
import numpy as np
df['city'] = np.where(
df.neighborhood.str.contains('City'),
df.neighborhood.str.replace(' City', '', case = False),
None)
结果:
neighborhood city
0 Chicago City Chicago
1 Wicker Park None
2 Bucktown None
3 Lincoln Park None
4 West Loop None
5 River North None
6 Milwaukee City Milwaukee
7 Bay View None
8 East Side None
9 South Side None
10 Bronzeville None
11 North Side None
12 New York City New York
13 Harlem None
14 Midtown None
15 Chinatown None
df['city'] = df['city'].fillna(method = 'ffill')
结果:
neighborhood city
0 Chicago City Chicago
1 Wicker Park Chicago
2 Bucktown Chicago
3 Lincoln Park Chicago
4 West Loop Chicago
5 River North Chicago
6 Milwaukee City Milwaukee
7 Bay View Milwaukee
8 East Side Milwaukee
9 South Side Milwaukee
10 Bronzeville Milwaukee
11 North Side Milwaukee
12 New York City New York
13 Harlem New York
14 Midtown New York
15 Chinatown New York
关于python - 根据某行的某个值派生一个新的pandas列,并应用直到下一个值再次出现,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/55112419/