python - python-在 Pandas 的条件下删除重复的行

我有一个像这样的DataFrame：

  NoDemande   NoUsager  Sens  IdVehiculeUtilise  Fait  HeurePrevue  HeureDebutTrajet
0 42191000823  001208    +         246Véh         1    08:20:04     08:22:26
1 42191000822  001208    +         246Véh         1    08:20:04     08:18:56
2 42191000822  001208    -         246Véh        -99   09:05:03     08:56:26
3 42191000823  001208    -         246Véh         1    09:05:03     08:56:26
4 42191000834  001208    +         246Véh         1    16:50:04     16:39:26
5 42191000834  001208    -         246Véh         1    17:45:03     17:25:10
6 42192000761  001208    +         246Véh        -1    08:20:04     08:15:07
7 42192000762  001208    +         246Véh         1    08:20:04     08:18:27
8 42192000762  001208    -         246Véh        -99   09:05:03     08:58:29
9 42192000761  001208    -         246Véh        -11   09:05:03     08:58:29

我从df[df.duplicated(['NoUsager','NoDemande'],keep=False)]获得此数据帧，以确保我的行成对出现。我想在NoDemande是连续数字（例如42191000822和42191000823、42192000761和42192000762）且列HeurePrevue相同时删除一对行，这意味着记录被记录了两次。我必须删除一对，并且要在Fait列中预装一个具有更大正数的数字（至少一个大于0的数字）

所以我的结果应该像这样：

  NoDemande   NoUsager  Sens  IdVehiculeUtilise  Fait  HeurePrevue  HeureDebutTrajet
0 42191000823  001208    +         246Véh         1    08:20:04     08:22:26
3 42191000823  001208    -         246Véh         1    09:05:03     08:56:26
4 42191000834  001208    +         246Véh         1    16:50:04     16:39:26
5 42191000834  001208    -         246Véh         1    17:45:03     17:25:10
7 42192000762  001208    +         246Véh         1    08:20:04     08:18:27
8 42192000762  001208    -         246Véh        -99   09:05:03     08:58:29

我知道这与OR逻辑有关，但我不知道如何实现。

任何帮助将不胜感激〜

最佳答案

我针对此问题的方法是制作两列，其中包含检查条件（相同的压力和不断增加的NoDemande）。然后遍历数据帧，并根据Fait列删除不需要的对。

有点hacky代码，但这似乎可以解决问题：

# Recreate DataFrame
df = pd.DataFrame({
    'NoDemande': [23, 22, 22, 23, 34, 34, 61, 62, 62, 61],
    'HeurePrevue': [84, 84, 93, 93, 64, 73, 84, 84, 93, 93],
    'Fait': [1, 1, -99, 1, 1, 1, -1, 1, -99, -11]
    }, columns=['NoDemande', 'Fait', 'HeurePrevue'])

# Make columns which contain conditions for inspection
df['sameHeure'] = df.HeurePrevue.iloc[1:] == df.HeurePrevue.iloc[:-1]
df['cont'] = df.NoDemande.diff()

# Cycle over rows
for prev_row, row in zip(df.iloc[:-1].itertuples(), df.iloc[1:].itertuples()):
    if row.sameHeure and (row.cont == 1):  # If rows are continuous and have the same Heure delete a pair
        pair_1 = df.loc[df.NoDemande == row.NoDemande]
        pair_2 = df.loc[df.NoDemande == prev_row.NoDemande]
        if sum(pair_1.Fait > 0) < sum(pair_2.Fait > 0):  # Find which pair to delete
            df.drop(pair_1.index, inplace=True)
        else:
            df.drop(pair_2.index, inplace=True)

df.drop(['cont', 'sameHeure'], 1, inplace=True)  # Throw away the added columns

结果：

print(df)

   NoDemande  Fait  HeurePrevue
0         23     1           84
3         23     1           93
4         34     1           64
5         34     1           73
7         62     1           84
8         62   -99           93

关于python - python-在 Pandas 的条件下删除重复的行，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/39323225/