python - 对numpy被屏蔽数组的操作给出被屏蔽的无效值

从有关numpy operations on numpy arrays中的掩码数组的文档中：

numpy.ma模块带有大多数ufunc的特定实现。具有有效域的一元和二进制函数（例如对数或除法）每当输入被屏蔽或超出有效域时都返回被屏蔽的常量：例如：

ma.log([-1, 0, 1, 2])
masked_array(data = [-- -- 0.0 0.69314718056],
             mask = [ True  True False False],
       fill_value = 1e+20)

我有一个问题，在计算时，我需要知道这些无效操作在哪里产生。具体来说，我想这样做：

ma.log([-1, 0, 1, 2])
masked_array(data = [np.nan -- 0.0 0.69314718056],
             mask = [ True  True False False],
       fill_value = 1e+20)

冒着这个问题进行对话的风险，我的主要问题是：

有什么好办法可以使此masked_array的计算出的无效值（由fix_invalid“固定”的那些值，如np.nan和np.inf）不转换为屏蔽值（并与之屏蔽）？

我当前的解决方案是在masked_array.data上计算函数，然后使用原始掩码重新构造掩码数组。但是，我正在编写一个应用程序，将用户的任意函数映射到许多不同的数组上，其中一些被掩码，有些未被掩码，并且我希望避免仅针对掩码数组的特殊处理程序。此外，这些数组在MISSING，NaN和Inf之间具有重要的区别，因此我不能只使用带有np.nan而不是masked值的数组。

另外，如果有人对为什么存在这种行为有任何看法，我想知道。在同一操作中具有此功能似乎很奇怪，因为对未屏蔽值的操作结果的有效性确实是用户的责任，用户可以选择使用fix_invalid函数“清理”。

此外，如果有人对numpy缺失值的进度有所了解，请分享，因为最早的posts是2011年至2012年，那场辩论从未导致任何结果。

编辑：2017-10-30

增加hpaulj的答案；具有修改域的log函数的定义会对numpy名称空间中的日志行为产生副作用。

In [1]: import numpy as np

In [2]: np.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
  #!/home/salotz/anaconda3/bin/python
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: invalid value encountered in log
  #!/home/salotz/anaconda3/bin/python
Out[2]:
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)

In [3]: mylog = np.ma.core._MaskedUnaryOperation(np.core.umath.log)

In [4]: np.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: divide by zero encountered in log
  #!/home/salotz/anaconda3/bin/python
/home/salotz/anaconda3/bin/ipython:1: RuntimeWarning: invalid value encountered in log
  #!/home/salotz/anaconda3/bin/python
Out[4]:
masked_array(data = [-- -inf 0.0 0.6931471805599453],
             mask = [ True False False False],
       fill_value = 1e+20)

np.log现在具有与mylog相同的行为，但是np.ma.log不变：

In [5]: np.ma.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[5]:
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)

有办法避免这种情况吗？

使用Python 3.6.2 :: Anaconda custom (64-bit)和numpy 1.12.1

最佳答案

只需弄清楚这里发生了什么

np.ma.log在参数上运行np.log，但是会捕获警告：

In [26]: np.log([-1,0,1,2])
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in log
  #!/usr/bin/python3
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in log
  #!/usr/bin/python3
Out[26]: array([        nan,        -inf,  0.        ,  0.69314718])

它掩盖了nan和-inf值。并且显然将原始值复制到这些data插槽中：

In [27]: np.ma.log([-1,0,1,2])
Out[27]:
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [28]: _.data
Out[28]: array([-1.        ,  0.        ,  0.        ,  0.69314718])

（在Py3中运行； numpy版本1.13.1）

这种掩盖行为不是ma.log独有的。由其类别决定

In [41]: type(np.ma.log)
Out[41]: numpy.ma.core._MaskedUnaryOperation

在np.ma.core中，它用fill和domain属性定义：

log = _MaskedUnaryOperation(umath.log, 1.0,
                        _DomainGreater(0.0))

因此，有效域（未屏蔽）> 0：

In [47]: np.ma.log.domain([-1,0,1,2])
Out[47]: array([ True,  True, False, False], dtype=bool)

该域掩码为or-ed

In [54]: ~np.isfinite(np.log([-1,0,1,2]))
...
Out[54]: array([ True,  True, False, False], dtype=bool)

具有相同的值。

看起来我可以定义一个自定义log，它不会添加自己的域掩码：

In [58]: mylog = np.ma.core._MaskedUnaryOperation(np.core.umath.log)
In [59]: mylog([-1,0,1,2])
Out[59]:
masked_array(data = [        nan        -inf  0.          0.69314718],
             mask = False,
       fill_value = 1e+20)

In [63]: np.ma.masked_array([-1,0,1,2],[1,0,0,0])
Out[63]:
masked_array(data = [-- 0 1 2],
             mask = [ True False False False],
       fill_value = 999999)
In [64]: np.ma.log(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[64]:
masked_array(data = [-- -- 0.0 0.6931471805599453],
             mask = [ True  True False False],
       fill_value = 1e+20)
In [65]: mylog(np.ma.masked_array([-1,0,1,2],[1,0,0,0]))
Out[65]:
masked_array(data = [-- -inf 0.0 0.6931471805599453],
             mask = [ True False False False],
       fill_value = 1e+20)

关于python - 对numpy被屏蔽数组的操作给出被屏蔽的无效值，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/46983061/