问题描述
我有一个DataFrame df1
:
I have a DataFrame df1
:
df1.head() =
wght num_links
id_y id_x
3 133 0.000203 2
186 0.000203 2
5 6 0.000203 2
98 0.000203 2
184 0.000203 2
我需要计算一个名为thr
的变量
I need to calculate a variable called thr
,
thr = N*(N-1)*2,
其中N
是df1
的行数.
问题是当我计算thr
时,Python会抛出一个负值(尽管所有输入都是正值):
The problem is that when I calculate thr
,Python throws a negative value(although all of the inputs are positive):
ipdb> df1['wght'].count()*(df1['wght'].count()-1)*2
-712569744
可能的提示
第N行是
ipdb> df1['wght'].count()
137736
因此
ipdb> 137736*137735*2
37942135920.
考虑到可以分配给int32
的最大值是2147483647
,我怀疑NumPy认为type(thr) = <int32>
何时应为<int64>
.这有意义吗?
Taking into account that the max value that can be assigned to a int32
is 2147483647
, I suspect that NumPy considers type(thr) = <int32>
, when it should be <int64>
. Does this make sense?
请注意,我尚未编写生成df1
的代码,因为
Please note that I have not written the code that generates df1
because
ipdb> df1['wght'].count()
137736
但是,如果需要重现该错误,请告诉我.
However, if it is needed to reproduce the error, let me know.
谢谢.
推荐答案
您正在遇到np.int32
溢出,因此只需使用len(df)
而不是df.column.count()
.
You are experiencing np.int32
overflow, so just use len(df)
instead of df.column.count()
.
这是一个小演示:
In [149]: x = pd.DataFrame(np.random.randint(0,100,size=(137736, 3)), columns=list('ABC'))
In [150]: x.A.count() * (x.A.count() - 1) * 2
Out[150]: -712569744
In [151]: len(x) * (len(x) - 1) * 2
Out[151]: 37942135920
In [153]: type(x.A.count())
Out[153]: numpy.int32
In [154]: type(len(x))
Out[154]: int
这篇关于两个正数相乘会在Python 3中产生负输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!