我正在尝试读取一些人口数据作为学习熊猫的练习:

>>> countries = pd.read_table('country_data.txt',
                             thousands=',',
                             header=None,
                             names=["Country Name", "Area (km^2)", "Areami2",
                                    "Population", "Densitykm2", "Densitymi2",
                                    "Date", "Source"],
                             usecols=["Country Name", "Area (km^2)", "Population"],
                             index_col="Country Name"
                             )
>>> countries.head()




                Area (km^2) Population
Country Name
Monaco             2     36,136
Singapore        716     5,399,200
Vatican City     0.44    800
Bahrain          757     1,234,571
Malta            315     416,055


即使我指定了数千=',',也似乎将总体读取为字符串:

>>> countries.ix["Singapore"]["Population"]
'5,399,200'


我尝试在调用read_table的过程中将“ thousands =','”位移动,还检查了数据以查看是否有错误,但是那里只有数字值,我不知道其他地方看...

最佳答案

这是a bug in 0.12,并已固定在(即将发布)0.13中。

在此之前,我建议手动调整列:

In [11]: df['Population'].str.replace(',', '').astype(int)  # or float
Out[11]:
0      36136
1    5399200
2        800
3    1234571
4     416055
Name: Population, dtype: int64

In [12]: df['Population'] = df['Population'].str.replace(',', '').astype(int)

09-07 02:29