我正在尝试读取一些人口数据作为学习熊猫的练习:
>>> countries = pd.read_table('country_data.txt',
thousands=',',
header=None,
names=["Country Name", "Area (km^2)", "Areami2",
"Population", "Densitykm2", "Densitymi2",
"Date", "Source"],
usecols=["Country Name", "Area (km^2)", "Population"],
index_col="Country Name"
)
>>> countries.head()
给
Area (km^2) Population
Country Name
Monaco 2 36,136
Singapore 716 5,399,200
Vatican City 0.44 800
Bahrain 757 1,234,571
Malta 315 416,055
即使我指定了数千=',',也似乎将总体读取为字符串:
>>> countries.ix["Singapore"]["Population"]
'5,399,200'
我尝试在调用read_table的过程中将“ thousands =','”位移动,还检查了数据以查看是否有错误,但是那里只有数字值,我不知道其他地方看...
最佳答案
这是a bug in 0.12,并已固定在(即将发布)0.13中。
在此之前,我建议手动调整列:
In [11]: df['Population'].str.replace(',', '').astype(int) # or float
Out[11]:
0 36136
1 5399200
2 800
3 1234571
4 416055
Name: Population, dtype: int64
In [12]: df['Population'] = df['Population'].str.replace(',', '').astype(int)