问题描述
我有一个用#分隔的文件,其中包含三列:第一列是整数,第二列看起来像是浮点数,但不是,第三列是字符串.我尝试使用pandas.read_csv
I have a #-separated file with three columns: the first is integer, the second looks like a float, but isn't, and the third is a string. I attempt to load this directly into python with pandas.read_csv
In [149]: d = pandas.read_csv('resources/names/fos_names.csv', sep='#', header=None, names=['int_field', 'floatlike_field', 'str_field'])
In [150]: d
Out[150]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1673 entries, 0 to 1672
Data columns:
int_field 1673 non-null values
floatlike_field 1673 non-null values
str_field 1673 non-null values
dtypes: float64(1), int64(1), object(1)
pandas
尝试变得聪明,并自动将字段转换为有用的类型.问题是我实际上不希望这样做(如果这样做的话,我会使用converters
参数).如何防止pandas
自动转换类型?
pandas
tries to be smart and automatically convert fields to a useful type. The issue is that I don't actually want it to do so (if I did, I'd used the converters
argument). How can I prevent pandas
from converting types automatically?
推荐答案
我认为最好的选择是首先使用numpy将数据作为记录数组读取.
I think your best bet is to read the data in as a record array first using numpy.
# what you described:
In [15]: import numpy as np
In [16]: import pandas
In [17]: x = pandas.read_csv('weird.csv')
In [19]: x.dtypes
Out[19]:
int_field int64
floatlike_field float64 # what you don't want?
str_field object
In [20]: datatypes = [('int_field','i4'),('floatlike','S10'),('strfield','S10')]
In [21]: y_np = np.loadtxt('weird.csv', dtype=datatypes, delimiter=',', skiprows=1)
In [22]: y_np
Out[22]:
array([(1, '2.31', 'one'), (2, '3.12', 'two'), (3, '1.32', 'three ')],
dtype=[('int_field', '<i4'), ('floatlike', '|S10'), ('strfield', '|S10')])
In [23]: y_pandas = pandas.DataFrame.from_records(y_np)
In [25]: y_pandas.dtypes
Out[25]:
int_field int64
floatlike object # better?
strfield object
这篇关于防止 pandas 自动推断read_csv中的类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!