本文介绍了 pandas 系列到numpy数组转换错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于foll的熊猫系列. value_counts output():

I have a pandas series with foll. value_counts output():

NaN     2741
 197    1891
 127     188
 194      42
 195      24
 122      21

在本系列中执行describe()时,我得到:

When I perform describe() on this series, I get:

df[col_name].describe()
count    2738.000000
mean      172.182250
std        47.387496
min         0.000000
25%       171.250000
50%       197.000000
75%       197.000000
max       197.000000
Name: SS_D_1, dtype: float64

但是,如果我尝试找到最小值和最大值,则会得到nan作为答案:

However, if I try to find minimum and maximum, I get nan as answer:

numpy.min(df[col_name].values)
nan

另外,当我尝试将其转换为numpy数组时,我似乎得到的数组中只有nan的

Also, when I try t convert it to a numpy array, I seem to get an array with only nan's

numpy.array(df[col_name])

关于如何成功将pandas系列转换为numpy数组的任何建议

Any suggestion on how to convert from pandas series to numpy array succesfully

推荐答案

两个函数 np.min ,并且方法np.ndarray.min将始终为包含一个或多个NaN值的任何数组返回NaN(这是标准的IEE754浮点行为).

Both the function np.min and the method np.ndarray.min will always return NaN for any array that contains one or more NaN values (this is standard IEE754 floating point behaviour).

您可以使用 np.nanmin ,计算最小值时会忽略NaN值,例如:

You could use np.nanmin, which ignores NaN values when computing the min, e.g.:

np.nanmin(df[col_name].values)

一个更简单的选择就是使用 pd.Series.min() 方法,该方法已经忽略了NaN值,即:

An even simpler option is just to use the pd.Series.min() method, which already ignores NaN values, i.e.:

df[col_name].min()

我不知道为什么numpy.array(df[col_name])返回一个仅包含NaN的数组,除非df[col_name]已经开始仅包含NaN.我认为这一定是由于您的代码中的其他错误所致.

I have no idea why numpy.array(df[col_name]) would return an array containing only NaNs, unless df[col_name] already contained only NaNs to begin with. I assume this must be due to some other mistake in your code.

这篇关于 pandas 系列到numpy数组转换错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-29 08:25