我有一个大约 20k 行的 DataFrame,它看起来像这样:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Car_ID': ['B332', 'B332', 'B332', 'C315', 'C315', 'C315', 'C315', 'C315', 'F310', 'F310'], \
                    'Date': ['2018-03-12', '2018-03-14', '2018-03-15', '2018-03-17', '2018-03-13', '2018-03-15', \
                             '2018-03-18', '2018-03-21', '2018-03-10', '2018-03-13'], \
                    'Driver': ['Alex', 'Alex', 'Mick', 'Sara', 'Sara', 'Jean', 'Sara', 'Sara', 'Franck','Michel']})
df

Out:
    Car_ID  Date        Driver
0   B332    2018-03-12  Alex
1   B332    2018-03-14  Alex
2   B332    2018-03-15  Mick
3   C315    2018-03-17  Sara
4   C315    2018-03-13  Sara
5   C315    2018-03-15  Jean
6   C315    2018-03-18  Sara
7   C315    2018-03-21  Sara
8   F310    2018-03-10  Franck
9   F310    2018-03-13  Michel

我为数据框中的每个事件创建一个新列,如下所示:
df["Event"] = np.where(df.Car_ID.str.contains('B', case=True, na=False), 'Rent_Car_B', \
                    np.where(df.Car_ID.str.contains('C', case=True, na=False), 'Rent_Car_C', \
                    np.where(df.Car_ID.str.contains('F', case=True, na=False), 'Rent_Car_F', df.Car_ID)))
df

Out:
    Car_ID  Date        Driver  Event
0   B332    2018-03-12  Alex    Rent_Car_B
1   B332    2018-03-14  Alex    Rent_Car_B
2   B332    2018-03-15  Mick    Rent_Car_B
3   C315    2018-03-17  Sara    Rent_Car_C
4   C315    2018-03-13  Sara    Rent_Car_C
5   C315    2018-03-15  Jean    Rent_Car_C
6   C315    2018-03-18  Sara    Rent_Car_C
7   C315    2018-03-21  Sara    Rent_Car_C
8   F310    2018-03-10  Franck  Rent_Car_F
9   F310    2018-03-13  Michel  Rent_Car_F

对于我的 Event 列,我想为每个驱动程序更改添加新行,如下所示:
Out:
    Car_ID  Date        Driver  Event
0   B332    2018-03-12  Alex    Rent_Car_B
1   B332    2018-03-14  Alex    Rent_Car_B
2   B332    2018-03-15  Mick    Rent_Car_B
3   B332    2018-03-15          Alex to Mick
4   C315    2018-03-17  Sara    Rent_Car_C
5   C315    2018-03-13  Sara    Rent_Car_C
6   C315    2018-03-15  Jean    Rent_Car_C
7   C315    2018-03-15          Sara to Jean
8   C315    2018-03-18  Sara    Rent_Car_C
9   C315    2018-03-18          Jean to Sara
10  C315    2018-03-21  Sara    Rent_Car_C
11  F310    2018-03-10  Franck  Rent_Car_F
12  F310    2018-03-13  Michel  Rent_Car_F
13  F310    2018-03-13          Franck to Mike

我不确定是否有一些技巧可以实现这项工作。
我会很高兴你的建议!

最佳答案

使用 shift 方法并首先用它创建一个列,我们将在之后使用它:

df['Driver_shift'] = df['Driver'].shift()

使用掩码选择您实际更改驱动程序和相同 car_ID 的行:
mask = (df['Driver'] != df['Driver_shift'])&(df['Car_ID'] == df['Car_ID'].shift())
df_change = df[mask]

现在,通过添加 0.5 来更改索引以供以后连接和排序,并更改两列的值:
df_change = df_change.set_index(df_change.index+0.5)
df_change.loc[:,'Event'] = df_change['Driver_shift'] + ' to ' + df_change['Driver']
df_change['Driver'] = '' # to replace the value

现在您可以连接、排序、reset_index 和删除:
pd.concat([df,df_change]).sort_index().reset_index(drop=True).drop('Driver_shift',1)

你会得到:
   Car_ID        Date  Driver             Event
0    B332  2018-03-12    Alex        Rent_Car_B
1    B332  2018-03-14    Alex        Rent_Car_B
2    B332  2018-03-15    Mick        Rent_Car_B
3    B332  2018-03-15              Alex to Mick
4    C315  2018-03-17    Sara        Rent_Car_C
5    C315  2018-03-13    Sara        Rent_Car_C
6    C315  2018-03-15    Jean        Rent_Car_C
7    C315  2018-03-15              Sara to Jean
8    C315  2018-03-18    Sara        Rent_Car_C
9    C315  2018-03-18              Jean to Sara
10   C315  2018-03-21    Sara        Rent_Car_C
11   F310  2018-03-10  Franck        Rent_Car_F
12   F310  2018-03-13  Michel        Rent_Car_F
13   F310  2018-03-13          Franck to Michel

编辑:在每个驱动程序和日期之前添加一行
df1 = df.copy()
df1.index = df1.index +0.5
df2 = pd.concat([df.drop('Event',1),df1]).sort_index().reset_index(drop=True)
df2['Event'] = df2['Event'].fillna(df2['Driver'])

结果在 df2

关于python - 如何根据带条件的列值在数据框中插入行?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50910334/

10-14 07:06