我有一个如下数据框:
column1 column2 column3 column4
6,546 543,254,32 (443,326) (32,000)
4,554 432,885 (88,974) 77,332
n.a - 5,332 -
... ... ... ...
# this df stretches for over 500 rows, and all columns could potentially have
# values within brackets, 'n.a', '-'
我遇到的麻烦是将
( , )
中的所有值替换为-443326
,即删除括号和逗号我知道我可以做
df.replace('n.a', numpy.nan, inplace=True)
,并且如果它们匹配,它将替换相应的值。但是,对于
df.replace('(', numpy.nan, inplace=True)
来说,该方法无效。我尝试使用循环来解决我的问题:
for i in df.columns():
df[i] = df[i].str.replace('(', '-')
df[i] = df[i].str.replace(')', '')
df[i] = df[i].str.replace(',', '')
这似乎可行,但它给了我一个警告消息:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
怎么样
最佳答案
这是一个稍微不同的方法:
In [89]: df.replace(r'[^\d\.]+', '', regex=True).apply(pd.to_numeric, errors='coerce')
Out[89]:
column1 column2 column3 column4
0 6546.0 54325432.0 443326 32000.0
1 4554.0 432885.0 88974 77332.0
2 NaN NaN 5332 NaN