python - Pandas read_table()数千='，'不起作用

我正在尝试读取一些人口数据作为学习熊猫的练习：

>>> countries = pd.read_table('country_data.txt',
                             thousands=',',
                             header=None,
                             names=["Country Name", "Area (km^2)", "Areami2",
                                    "Population", "Densitykm2", "Densitymi2",
                                    "Date", "Source"],
                             usecols=["Country Name", "Area (km^2)", "Population"],
                             index_col="Country Name"
                             )
>>> countries.head()

给

                Area (km^2) Population
Country Name
Monaco             2     36,136
Singapore        716     5,399,200
Vatican City     0.44    800
Bahrain          757     1,234,571
Malta            315     416,055

即使我指定了数千='，'，也似乎将总体读取为字符串：

>>> countries.ix["Singapore"]["Population"]
'5,399,200'

我尝试在调用read_table的过程中将“ thousands ='，'”位移动，还检查了数据以查看是否有错误，但是那里只有数字值，我不知道其他地方看...

最佳答案

这是a bug in 0.12，并已固定在（即将发布）0.13中。

在此之前，我建议手动调整列：

In [11]: df['Population'].str.replace(',', '').astype(int)  # or float
Out[11]:
0      36136
1    5399200
2        800
3    1234571
4     416055
Name: Population, dtype: int64

In [12]: df['Population'] = df['Population'].str.replace(',', '').astype(int)