我的数据集有很多列,其中包含$ values和逗号。 $ 150,000.50。导入数据集后:
datasets = pd.read_csv('salaries-by-college-type.csv')
由于这些列中的一堆值是$值,因此不当对象失败。如何在python程序中更正它
这是我的数据集。除“学校类型”休息外,所有其他值都有$值,且以逗号表示。是否有一种通用方法可以从这些列值中删除那些$和逗号
School Type 269 non-null object
Starting Median Salary 269 non-null float64
Mid-Career Median Salary 269 non-null float64
Mid-Career 10th Percentile Salary 231 non-null float64
Mid-Career 25th Percentile Salary 269 non-null float64
Mid-Career 75th Percentile Salary 269 non-null float64
Mid-Career 90th Percentile Salary 231 non-null float64
这是我的数据集的示例:
School Type Starting Median Salary Mid-Career Median Salary Mid-Career 10th Percentile Salary Mid-Career 25th Percentile Salary Mid-Career 75th Percentile Salary Mid-Career 90th Percentile Salary
Engineering $72,200.00 $126,000.00 $76,800.00 $99,200.00 $168,000.00 $220,000.00
Engineering $75,500.00 $123,000.00 N/A $104,000.00 $161,000.00 N/A
Engineering $71,800.00 $122,000.00 N/A $96,000.00 $180,000.00 N/A
Engineering $62,400.00 $114,000.00 $66,800.00 $94,300.00 $143,000.00 $190,000.00
Engineering $62,200.00 $114,000.00 N/A $80,200.00 $142,000.00 N/A
Engineering $61,000.00 $114,000.00 $80,000.00 $91,200.00 $137,000.00 $180,000.00
最佳答案
假设您有一个看起来像这样的csv。
注意:我真的不知道您的csv是什么样。确保相应地调整read_csv
参数。最具体地说,是sep
参数。
h1|h2
a|$1,000.99
b|$500,000.00
在
converters
中使用pd.read_csv
参数将要转换的列的名称作为键传递给字典,将要进行转换的函数作为值传递给字典。
pd.read_csv(
'salaries-by-college-type.csv', sep='|',
converters=dict(h2=lambda x: float(x.strip('$').replace(',', '')))
)
h1 h2
0 a 1000.99
1 b 500000.00
或者假设您已经导入了数据框
df = pd.read_csv(
'salaries-by-college-type.csv', sep='|'
)
然后使用
pd.Series.str.replace
df.h2 = df.h2.str.replace('[^\d\.]', '').astype(float)
df
h1 h2
0 a 1000.99
1 b 500000.00
或
pd.DataFrame.replace
df.replace(dict(h2='[^\d\.]'), '', regex=True).astype(dict(h2=float))
h1 h2
0 a 1000.99
1 b 500000.00
关于python - 如何从Python的列值中去除$符号,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/46596599/