在准备Count how many different values a list takes in Mathematica的答案时,我遇到了我不理解的DeleteDuplicatesTally的不稳定(因为缺乏更好的术语)。

首先考虑:

a = {2.2000000000000005, 2.2, 2.1999999999999999};

a // InputForm
DeleteDuplicates@a // InputForm
Union@a // InputForm
Tally@a // InputForm
   {2.2000000000000006`, 2.2, 2.1999999999999997`}
   {2.2000000000000006`, 2.2, 2.1999999999999997`}
   {2.1999999999999997`, 2.2, 2.2000000000000006`}
   {{2.2000000000000006`, 3}}

This behavior is as I expected in each case. Tally compensates for the slight numerical differences and sees each element as being equivalent. Union and DeleteDuplicates see all elements as unique. (This behavior of Tally is not documented to my knowledge, but I have made use of it before.)

Now, consider this complication:

a = {11/5, 2.2000000000000005, 2.2, 2.1999999999999997};

a // InputForm
DeleteDuplicates@a // InputForm
Union@a // InputForm
Tally@a // InputForm
   {11/5, 2.2000000000000006, 2.2, 2.1999999999999997}
   {11/5, 2.2000000000000006, 2.2}
   {2.1999999999999997, 2.2, 11/5, 2.2000000000000006}
   {{11/5, 1}, {2.2000000000000006, 1}, {2.2, 2}}

The output of Union is as anticipated, but the results from both DeleteDuplicates and Tally are surprising.

  • Why does DeleteDuplicates suddenly see 2.1999999999999997 as a duplicate to be eliminated?

  • Why does Tally suddenly see 2.2000000000000006 and 2.2 as distinct, when it did not before?


As a related point, it can be seen that packed arrays affect Tally:

a = {2.2000000000000005, 2.2, 2.1999999999999999};
a // InputForm
Tally@a // InputForm
   {2.2000000000000006, 2.2, 2.1999999999999997}
   {{2.2000000000000006`, 3}}
a = Developer`ToPackedArray@a;
a // InputForm
Tally@a // InputForm

{2.2000000000000006,2.2,2.1999999999999997}

{{2.2000000000000006`,1},{2.2,2}}

最佳答案

表现出的行为似乎是由与浮点算术相关的常见问题加上正在讨论的某些函数中的某些可疑行为导致的。

SameQ不是等价关系

首先,考虑到SameQ不是等价关系,因为它不是可传递的:

In[1]:= $a = {11/5, 2.2000000000000005, 2.2, 2.1999999999999997};

In[2]:= SameQ[$a[[2]], $a[[3]]]
Out[2]= True

In[3]:= SameQ[$a[[3]], $a[[4]]]
Out[3]= True

In[4]:= SameQ[$a[[2]], $a[[4]]]
Out[4]= False                     (* !!! *)

因此,即使在转向其他功能之前,我们也面临着不稳定的行为。

此行为是由于SameQ的记录规则所致,该规则说如果两个实数“相异其最后一个二进制数”,则将它们视为“相等”:
In[5]:= {# // InputForm, Short@RealDigits[#, 2][[1, -10;;]]} & /@ $a[[2;;4]] // TableForm
(* showing only the last ten binary digits for each *)
Out[5]//TableForm= 2.2000000000000006  {0,1,1,0,0,1,1,0,1,1}
                   2.2                 {0,1,1,0,0,1,1,0,1,0}
                   2.1999999999999997  {0,1,1,0,0,1,1,0,0,1}

请注意,严格来说,$a[[3]]$a[[4]]在最后两个二进制数字中有所不同,但是差异的大小是最低位的一位。

DeleteDuplicates并不真正使用SameQ

接下来,考虑文档说明DeleteDuplicates[...]等效于DeleteDuplicates[..., SameQ]。好吧,这是完全正确的-但可能并非您所期望的那样:
In[6]:= DeleteDuplicates[$a] // InputForm
Out[6]//InputForm= {11/5, 2.2000000000000006, 2.2}

In[7]:= DeleteDuplicates[$a, SameQ] // InputForm
Out[7]//InputForm= {11/5, 2.2000000000000006, 2.2}

与所记录的相同...但是如何处理:
In[8]:= DeleteDuplicates[$a, SameQ[#1, #2]&] // InputForm
Out[8]//InputForm= {11/5, 2.2000000000000006, 2.1999999999999997}

当比较函数显然是DeleteDuplicates而不是行为与SameQ相同的函数时,SameQ似乎经历了逻辑的不同分支。

Tally是...困惑
Tally显示类似但不完全相同的不稳定行为:
In[9]:= Tally[$a] // InputForm
Out[9]//InputForm=  {{11/5, 1}, {2.2000000000000006, 1}, {2.2, 2}}

In[10]:= Tally[$a, SameQ] // InputForm
Out[10]//InputForm= {{11/5, 1}, {2.2000000000000006, 1}, {2.2, 2}}

In[11]:= Tally[$a, SameQ[#1, #2]&] // InputForm
Out[11]//InputForm= {{11/5, 1}, {2.2000000000000006, 1}, {2.2000000000000006, 2}}

最后一个特别令人困惑,因为相同的数字在列表中以不同的计数出现了两次。

等于遭受相似的问题

现在,回到浮点相等的问题。 Equal的票价要比SameQ好一点-但要强调“小”。 Equal查看最后七个二进制数字,而不是最后一个二进制数字。但这并不能解决问题,但总会发现一些麻烦的情况:
In[12]:= $x1 = 0.19999999999999823;
         $x2 = 0.2;
         $x3 = 0.2000000000000018;

In[15]:= Equal[$x1, $x2]
Out[15]= True

In[16]:= Equal[$x2, $x3]
Out[16]= True

In[17]:= Equal[$x1, $x3]
Out[17]= False             (* Oops *)

恶棍未遮盖

所有这些讨论的主要罪魁祸首是浮点实数格式。根本不可能使用有限格式完全保真地表示任意实数。这就是为什么Mathematica强调符号形式并尽一切可能尝试尽可能长时间地使用符号形式的表达式的原因。如果发现数字形式是不可避免的,那么就必须涉足名为swampnumerical analysis中,以理清涉及平等和不平等的所有极端情况。

较差的SameQEqualDeleteDuplicatesTally及其所有 friend 从来没有机会。

关于wolfram-mathematica - DeleteDuplicates和Tally中的不稳定,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/6166895/

10-09 06:47