DataFrame列转换为原生python数据类型

DataFrame列转换为原生python数据类型

本文介绍了如何将Pandas DataFrame列转换为原生python数据类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其列数据类型需要映射到python本机数据类型.

I have a dataframe whose columns data types need to be mapped to python native data types.

我希望能够从numpy获取字典并将每列转换为其原始类型.

I want to be able to get a dictionary from numpy and convert each column to it's native type.

例如:

{numpy.object_: object,
 numpy.bool_: bool,
 numpy.string_: str,
 numpy.unicode_: unicode,
 numpy.int64: int,
 numpy.float64: float,
 numpy.complex128: complex}

我尝试了astypepd.to_numeric,但都没有充分降低该列的压力.

I tried both astype and pd.to_numeric, neither downcasts the column sufficiently.

df['source'] = df['source'].astype(int)pd.to_numeric

大多数评论质疑这样做的智慧. networkx读取dataframes并接受np datatypes.但是,由于存在以下已记录的错误,因此无法使用json_dumps编写图形:TypeError: Object of type 'int64' is not JSON serializable

Most of the comments question the wisdom for doing this. networkx reads dataframes and accepts np datatypes. However the graph cannot be written using json_dumps because of this well documented error: TypeError: Object of type 'int64' is not JSON serializable

谢谢

推荐答案

对熊猫(或对numpy)的本机Python类型"是一个对象.就是这样. Pandas只知道这是一个Python对象,因此会采取相应的行动.除此之外,您不能具有字符串,Unicode,整数等类型的列.

"Native Python type" to pandas (or to numpy) is an object. That's the extent of it. Pandas only knows it's a Python object and act accordingly. Other than that, you cannot have columns of type string, unicode, integers etc.

不过,您可以具有对象列,并将所需的内容存储在其中.在此阶段,熊猫将为您处理大部分转换.

You can have object columns and store whatever you want inside them, though. Pandas will handle most of the conversion for you at this stage.

df = pd.DataFrame({'A': [1, 2],
                   'B': [1., 2.],
                   'C': [1 + 2j, 3 + 4j],
                   'D': [True, False],
                   'E': ['a', 'b'],
                   'F': [b'a', b'b']})

df.dtypes
Out[71]:
A         int64
B       float64
C    complex128
D          bool
E        object
F        object
dtype: object

for col in df:
    print(type(df.loc[0, col]))

<class 'numpy.int64'>
<class 'numpy.float64'>
<class 'numpy.complex128'>
<class 'numpy.bool_'>
<class 'str'>
<class 'bytes'>


df = df.astype('object')

for col in df:
    print(type(df.loc[0, col]))

<class 'int'>
<class 'float'>
<class 'complex'>
<class 'bool'>
<class 'str'>
<class 'bytes'>

这篇关于如何将Pandas DataFrame列转换为原生python数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-12 03:06