本文介绍了DataFrame相关会产生NaN,尽管其值都是整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框df:

df   = pandas.DataFrame(pd.read_csv(loggerfile, header = 2))

values = df.as_matrix()

df2 = pd.DataFrame.from_records(values, index = datetimeIdx, columns = Columns)

现在按照建议的方式读取数据:

Now reading the data this way as suggested:

df2 = pd.read_csv(loggerfile, header = None, skiprows = [0,1,2])

示例:

                         0              1       2   3   4   5   6   7   8   \
0  2014-03-19T12:44:32.695Z  1395233072695  703425   0   2   1  13   5  21
1  2014-03-19T12:44:32.727Z  1395233072727  703425   0   2   1  13   5  21

   9   10  11   12  13   14  15  16
0  25   0  25  209   0  145   0   0
1  25   0  25  209   0  146   0   0

所有列均为int类型(第一个除外):

The columns are all type int (except the first one):

print df2.dtypes

0     object
1      int64
2      int64
3      int64
4      int64
5      int64
6      int64
7      int64
8      int64
9      int64
10     int64
11     int64
12     int64
13     int64
14     int64
15     int64
16     int64

但是根据我的相关性,有些列似乎是NaN.

But in my correlation, some columns seem to be NaN.

df2.corr()

     1          2    3          4           5   6   7            8           ...
1    1.000000   NaN  0.018752   -0.550307   NaN NaN 0.075191     0.775725
2    NaN        NaN  NaN         NaN        NaN NaN NaN          NaN
3    0.018752   NaN  1.000000   -0.067293   NaN NaN -0.579651    0.004593
...

推荐答案

正如,Joris指出,如果值不变,您会期望NaN.要了解为什么要看相关公式:

As, Joris points out you would expected NaN if the values do not vary. To see why take a look at correlation formula:

cor(i,j) = cov(i,j)/[stdev(i)*stdev(j)]

如果第ith或第j变量的值没有变化,则各自的标准偏差将为零,分数的分母也将为零.因此,相关性将为NaN.

If the values of the ith or jth variable do not vary, then the respective standard deviation will be zero and so will the denominator of the fraction. Thus, the correlation will be NaN.

这篇关于DataFrame相关会产生NaN,尽管其值都是整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 12:37