python - 如何在 Pandas 数据框中替换字符串中的子字符串

我有一个dataframe，还有一个要从该dataframe的列中删除的字符串列表。但当我使用replace函数时，这些字符仍然存在。有人能解释一下为什么会这样吗？

bad_chars = ['?', '!', ',', ';', "'", '|', '-', '--', '(', ')',
             '[', ']', '{', '}', ':', '&', '\n']

以及替换：

df2['page'] = df2['page'].replace(bad_chars, '')

当我打印出来时：

for index, row in df2.iterrows():
    print( row['project'] + '\t' + '(' + row['page'] + ',' + str(row['viewCount']) + ')' + '\n'  )

英语（美国第14季，613）

最佳答案

一种方法是使用re转义字符，然后使用pd.Series.str.replace。

import pandas as pd
import re

bad_chars = ['?', '!', ',', ';', "'", '|', '-', '--', '(', ')',
             '[', ']', '{', '}', ':', '&', '\n']

df = pd.DataFrame({'page': ['hello?', 'problems|here', 'nothingwronghere', 'nobrackets[]']})

df['page'] = df['page'].str.replace('|'.join([re.escape(s) for s in bad_chars]), '')

print(df)

#                page
# 0             hello
# 1      problemshere
# 2  nothingwronghere
# 3        nobrackets