问题描述
import pandas as pd
path1 = "/home/supertramp/Desktop/100&life_180_data.csv"
mydf = pd.read_csv(path1)
numcigar = {"Never":0 ,"1-5 Cigarettes/day" :1,"10-20 Cigarettes/day":4}
print mydf['Cigarettes']
mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)
print mydf['CigarNum']
mydf.to_csv('/home/supertramp/Desktop/powerRangers.csv')
csv文件"100& life_180_data.csv"包含年龄,bmi,香烟,酒精等列.
The csv file "100&life_180_data.csv" contains columns like age, bmi,Cigarettes,Alocohol etc.
No int64
Age int64
BMI float64
Alcohol object
Cigarettes object
dtype: object
香烟"列包含从不","1-5香烟/天","10-20香烟/天".我想为这些对象分配权重(从不,每天1-5支香烟,....)
Cigarettes column contains "Never" "1-5 Cigarettes/day","10-20 Cigarettes/day".I want to assign weights to these object (Never,1-5 Cigarettes/day ,....)
期望的输出是附加的CigarNum新列,该列仅包含数字0,1,2CigarNum可以预期到8行,然后在CigarNum列中显示Nan到最后一行
The expected output is new column CigarNum appended which consists only numbers 0,1,2CigarNum is as expected till 8 rows and then shows Nan till last row in CigarNum column
0 Never
1 Never
2 1-5 Cigarettes/day
3 Never
4 Never
5 Never
6 Never
7 Never
8 Never
9 Never
10 Never
11 Never
12 10-20 Cigarettes/day
13 1-5 Cigarettes/day
14 Never
...
167 Never
168 Never
169 10-20 Cigarettes/day
170 Never
171 Never
172 Never
173 Never
174 Never
175 Never
176 Never
177 Never
178 Never
179 Never
180 Never
181 Never
Name: Cigarettes, Length: 182, dtype: object
我得到的输出应该在前几行之后不给出NaN.
The output I get shoudln't give NaN after few first rows.
0 0
1 0
2 1
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 NaN
11 NaN
12 NaN
13 NaN
14 0
...
167 NaN
168 NaN
169 NaN
170 NaN
171 NaN
172 NaN
173 NaN
174 NaN
175 NaN
176 NaN
177 NaN
178 NaN
179 NaN
180 NaN
181 NaN
Name: CigarNum, Length: 182, dtype: float64
推荐答案
好的,第一个问题是您嵌入了空格,导致该函数无法正确应用:
OK, first problem is you have embedded spaces causing the function to incorrectly apply:
使用矢量化的str
修复此问题:
fix this using vectorised str
:
mydf['Cigarettes'] = mydf['Cigarettes'].str.replace(' ', '')
现在创建您的新列就可以了:
now create your new column should just work:
mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)
更新
一如既往,感谢@Jeff指出了做事的上乘方式:
Thanks to @Jeff as always for pointing out superior ways to do things:
因此您可以致电replace
而不是致电apply
:
So you can call replace
instead of calling apply
:
mydf['CigarNum'] = mydf['Cigarettes'].replace(numcigar)
# now convert the types
mydf['CigarNum'] = mydf['CigarNum'].convert_objects(convert_numeric=True)
您也可以使用factorize
方法.
考虑一下,为什么不将dict值设置为浮点数,然后又避免类型转换呢?
Thinking about it why not just set the dict values to be floats anyway and then you avoid the type conversion?
所以:
numcigar = {"Never":0.0 ,"1-5 Cigarettes/day" :1.0,"10-20 Cigarettes/day":4.0}
版本0.17.0或更高版本
convert_objects
,已将其替换为 to_numeric
convert_objects
is deprecated since 0.17.0
, this has been replaced with to_numeric
mydf['CigarNum'] = pd.to_numeric(mydf['CigarNum'], errors='coerce')
此处errors='coerce'
将返回NaN
,其中无法将值转换为数字值,否则将引发异常
Here errors='coerce'
will return NaN
where the values cannot be converted to a numeric value, without this it will raise an exception
这篇关于使用pandas将字符串对象转换为int/float的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!