问题描述
我有一个关于foll的熊猫系列. value_counts
output():
I have a pandas series with foll. value_counts
output():
NaN 2741
197 1891
127 188
194 42
195 24
122 21
在本系列中执行describe()时,我得到:
When I perform describe() on this series, I get:
df[col_name].describe()
count 2738.000000
mean 172.182250
std 47.387496
min 0.000000
25% 171.250000
50% 197.000000
75% 197.000000
max 197.000000
Name: SS_D_1, dtype: float64
但是,如果我尝试找到最小值和最大值,则会得到nan作为答案:
However, if I try to find minimum and maximum, I get nan as answer:
numpy.min(df[col_name].values)
nan
另外,当我尝试将其转换为numpy数组时,我似乎得到的数组中只有nan的
Also, when I try t convert it to a numpy array, I seem to get an array with only nan's
numpy.array(df[col_name])
关于如何成功将pandas系列转换为numpy数组的任何建议
Any suggestion on how to convert from pandas series to numpy array succesfully
推荐答案
两个函数 np.min
,并且方法np.ndarray.min
将始终为包含一个或多个NaN值的任何数组返回NaN(这是标准的IEE754浮点行为).
Both the function np.min
and the method np.ndarray.min
will always return NaN for any array that contains one or more NaN values (this is standard IEE754 floating point behaviour).
您可以使用 np.nanmin
,计算最小值时会忽略NaN值,例如:
You could use np.nanmin
, which ignores NaN values when computing the min, e.g.:
np.nanmin(df[col_name].values)
一个更简单的选择就是使用 pd.Series.min()
方法,该方法已经忽略了NaN值,即:
An even simpler option is just to use the pd.Series.min()
method, which already ignores NaN values, i.e.:
df[col_name].min()
我不知道为什么numpy.array(df[col_name])
返回一个仅包含NaN的数组,除非df[col_name]
已经开始仅包含NaN.我认为这一定是由于您的代码中的其他错误所致.
I have no idea why numpy.array(df[col_name])
would return an array containing only NaNs, unless df[col_name]
already contained only NaNs to begin with. I assume this must be due to some other mistake in your code.
这篇关于 pandas 系列到numpy数组转换错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!