我有一个这样的数据框。

我正在尝试删除出现在子字符串列中的字符串。

Main                     substring
Sri playnig well cricket cricket
sri went out             NaN
Ram is in                NaN
Ram went to UK,US        UK,US


我的期望值是

Main                     substring
Sri playnig well         cricket
sri went out             NaN
Ram is in                NaN
Ram went to              UK,US


我尝试了df["Main"].str.reduce(df["substring"])但没有用,请帮忙。

最佳答案

这是使用pd.DataFrame.apply的一种方法。请注意,np.nan == np.nan的计算结果为False,我们可以在函数中使用此技巧来确定何时应用删除逻辑。

import pandas as pd, numpy as np

df = pd.DataFrame({'Main': ['Sri playnig well cricket', 'sri went out',
                            'Ram is in' ,'Ram went to UK,US'],
                   'substring': ['cricket', np.nan, np.nan, 'UK,US']})

def remover(row):
    sub = row['substring']
    if sub != sub:
        return row['Main']
    else:
        lst = row['Main'].split()
        return ' '.join([i for i in lst if i!=sub])

df['Main'] = df.apply(remover, axis=1)

print(df)

               Main substring
0  Sri playnig well   cricket
1      sri went out       NaN
2         Ram is in       NaN
3       Ram went to     UK,US

关于python - 如何减少基于另一列的数据框列值的一部分,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50230233/

10-12 06:47