在准备Count how many different values a list takes in Mathematica的答案时,我遇到了我不理解的DeleteDuplicates
和Tally
的不稳定(因为缺乏更好的术语)。
首先考虑:
a = {2.2000000000000005, 2.2, 2.1999999999999999};
a // InputForm
DeleteDuplicates@a // InputForm
Union@a // InputForm
Tally@a // InputForm
{2.2000000000000006`, 2.2, 2.1999999999999997`}
{2.2000000000000006`, 2.2, 2.1999999999999997`}
{2.1999999999999997`, 2.2, 2.2000000000000006`}
{{2.2000000000000006`, 3}}
This behavior is as I expected in each case. Tally
compensates for the slight numerical differences and sees each element as being equivalent. Union
and DeleteDuplicates
see all elements as unique. (This behavior of Tally
is not documented to my knowledge, but I have made use of it before.)
Now, consider this complication:
a = {11/5, 2.2000000000000005, 2.2, 2.1999999999999997};
a // InputForm
DeleteDuplicates@a // InputForm
Union@a // InputForm
Tally@a // InputForm
{11/5, 2.2000000000000006, 2.2, 2.1999999999999997}
{11/5, 2.2000000000000006, 2.2}
{2.1999999999999997, 2.2, 11/5, 2.2000000000000006}
{{11/5, 1}, {2.2000000000000006, 1}, {2.2, 2}}
The output of Union
is as anticipated, but the results from both DeleteDuplicates
and Tally
are surprising.
Why does
DeleteDuplicates
suddenly see2.1999999999999997
as a duplicate to be eliminated?Why does
Tally
suddenly see2.2000000000000006
and2.2
as distinct, when it did not before?
As a related point, it can be seen that packed arrays affect Tally
:
a = {2.2000000000000005, 2.2, 2.1999999999999999};
a // InputForm
Tally@a // InputForm
{2.2000000000000006, 2.2, 2.1999999999999997}
{{2.2000000000000006`, 3}}
a = Developer`ToPackedArray@a;
a // InputForm
Tally@a // InputForm
{2.2000000000000006,2.2,2.1999999999999997}
{{2.2000000000000006`,1},{2.2,2}}
最佳答案
表现出的行为似乎是由与浮点算术相关的常见问题加上正在讨论的某些函数中的某些可疑行为导致的。
SameQ不是等价关系
首先,考虑到SameQ
不是等价关系,因为它不是可传递的:
In[1]:= $a = {11/5, 2.2000000000000005, 2.2, 2.1999999999999997};
In[2]:= SameQ[$a[[2]], $a[[3]]]
Out[2]= True
In[3]:= SameQ[$a[[3]], $a[[4]]]
Out[3]= True
In[4]:= SameQ[$a[[2]], $a[[4]]]
Out[4]= False (* !!! *)
因此,即使在转向其他功能之前,我们也面临着不稳定的行为。
此行为是由于
SameQ
的记录规则所致,该规则说如果两个实数“相异其最后一个二进制数”,则将它们视为“相等”:In[5]:= {# // InputForm, Short@RealDigits[#, 2][[1, -10;;]]} & /@ $a[[2;;4]] // TableForm
(* showing only the last ten binary digits for each *)
Out[5]//TableForm= 2.2000000000000006 {0,1,1,0,0,1,1,0,1,1}
2.2 {0,1,1,0,0,1,1,0,1,0}
2.1999999999999997 {0,1,1,0,0,1,1,0,0,1}
请注意,严格来说,
$a[[3]]
和$a[[4]]
在最后两个二进制数字中有所不同,但是差异的大小是最低位的一位。DeleteDuplicates并不真正使用SameQ
接下来,考虑文档说明
DeleteDuplicates[...]
等效于DeleteDuplicates[..., SameQ]
。好吧,这是完全正确的-但可能并非您所期望的那样:In[6]:= DeleteDuplicates[$a] // InputForm
Out[6]//InputForm= {11/5, 2.2000000000000006, 2.2}
In[7]:= DeleteDuplicates[$a, SameQ] // InputForm
Out[7]//InputForm= {11/5, 2.2000000000000006, 2.2}
与所记录的相同...但是如何处理:
In[8]:= DeleteDuplicates[$a, SameQ[#1, #2]&] // InputForm
Out[8]//InputForm= {11/5, 2.2000000000000006, 2.1999999999999997}
当比较函数显然是
DeleteDuplicates
而不是行为与SameQ
相同的函数时,SameQ
似乎经历了逻辑的不同分支。Tally是...困惑
Tally
显示类似但不完全相同的不稳定行为:In[9]:= Tally[$a] // InputForm
Out[9]//InputForm= {{11/5, 1}, {2.2000000000000006, 1}, {2.2, 2}}
In[10]:= Tally[$a, SameQ] // InputForm
Out[10]//InputForm= {{11/5, 1}, {2.2000000000000006, 1}, {2.2, 2}}
In[11]:= Tally[$a, SameQ[#1, #2]&] // InputForm
Out[11]//InputForm= {{11/5, 1}, {2.2000000000000006, 1}, {2.2000000000000006, 2}}
最后一个特别令人困惑,因为相同的数字在列表中以不同的计数出现了两次。
等于遭受相似的问题
现在,回到浮点相等的问题。
Equal
的票价要比SameQ
好一点-但要强调“小”。 Equal
查看最后七个二进制数字,而不是最后一个二进制数字。但这并不能解决问题,但总会发现一些麻烦的情况:In[12]:= $x1 = 0.19999999999999823;
$x2 = 0.2;
$x3 = 0.2000000000000018;
In[15]:= Equal[$x1, $x2]
Out[15]= True
In[16]:= Equal[$x2, $x3]
Out[16]= True
In[17]:= Equal[$x1, $x3]
Out[17]= False (* Oops *)
恶棍未遮盖
所有这些讨论的主要罪魁祸首是浮点实数格式。根本不可能使用有限格式完全保真地表示任意实数。这就是为什么Mathematica强调符号形式并尽一切可能尝试尽可能长时间地使用符号形式的表达式的原因。如果发现数字形式是不可避免的,那么就必须涉足名为swamp的numerical analysis中,以理清涉及平等和不平等的所有极端情况。
较差的
SameQ
,Equal
,DeleteDuplicates
,Tally
及其所有 friend 从来没有机会。关于wolfram-mathematica - DeleteDuplicates和Tally中的不稳定,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/6166895/