问题描述
我正在使用np.einsum
来乘法概率表,例如:
I am using np.einsum
to multiply probability tables like:
np.einsum('ijk,jklm->ijklm', A, B)
问题在于,我总共要处理26个以上的随机变量(轴),因此,如果给每个随机变量分配一个字母,我将用完所有字母.我是否可以通过另一种方式指定上述操作来避免此问题,而不必弄乱np.sum
和np.dot
操作?
The issue is that I am dealing with more than 26 random variables (axes) overall, so if I assign each random variable a letter I run out of letters. Is there another way I can specify the above operation to avoid this issue, without resorting to a mess of np.sum
and np.dot
operations?
推荐答案
最简单的答案是,您可以使用52个字母中的任何一个(大写和小写).这就是所有英文字母.任何更高级的轴名称都必须映射在这52个轴上,或一组等效的数字上.实际上,您将希望在任何一个einsum
调用中使用这52个中的很小一部分.
The short answer is, you can use any of the 52 letters (upper and lower). That's all the letters in the English language. Any fancier axes names will have to be mapped on those 52, or an equivalent set of numbers. Practically speaking you will want to use a fraction of those 52 in any one einsum
call.
@kennytm
建议使用替代输入语法.一些示例运行表明这不是解决方案.尽管有可疑的错误消息,但26仍然是实际限制.
@kennytm
suggests using the alternative input syntax. A few sample runs suggests that this is not a solution. 26 is still the practical limit (despite the suspicious error messages).
In [258]: np.einsum(np.ones((2,3)),[0,20],np.ones((3,4)),[20,2],[0,2])
Out[258]:
array([[ 3., 3., 3., 3.],
[ 3., 3., 3., 3.]])
In [259]: np.einsum(np.ones((2,3)),[0,27],np.ones((3,4)),[27,2],[0,2])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-259-ea61c9e50d6a> in <module>()
----> 1 np.einsum(np.ones((2,3)),[0,27],np.ones((3,4)),[27,2],[0,2])
ValueError: invalid subscript '|' in einstein sum subscripts string, subscripts must be letters
In [260]: np.einsum(np.ones((2,3)),[0,100],np.ones((3,4)),[100,2],[0,2])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-260-ebd9b4889388> in <module>()
----> 1 np.einsum(np.ones((2,3)),[0,100],np.ones((3,4)),[100,2],[0,2])
ValueError: subscript is not within the valid range [0, 52]
我不确定您为什么需要超过52个字母(大写和小写),但是我确定您需要进行某种映射.您不想一次使用超过52个轴编写einsum
字符串.生成的迭代器太大(对于内存或时间而言).
I'm not entirely sure why you need more than 52 letters (upper and lower case), but I'm sure you need to do some sort of mapping. You don't want to write an einsum
string using more than 52 axes all at once. The resulting iterator would be too large (for memory or time).
我正在描绘某种可以用作的映射功能:
I'm picturing some sort of mapping function that can be used as:
astr = foo(A.names, B.names)
# foo(['i','j','k'],['j','k','l','m'])
# foo(['a1','a2','a3'],['a2','a3','b4','b5'])
np.einsum(astr, A, B)
https://github.com/hpaulj/numpy-einsum/blob/master/einsum_py.py
是einsum
的Python版本.粗略地讲,einsum
解析下标字符串,创建一个op_axes
列表,该列表可在np.nditer
中用于设置所需的乘积和.通过这段代码,我可以看看翻译是如何完成的:
is a Python version of einsum
. Crudely speaking einsum
parses the subscripts string, creating an op_axes
list that can be used in np.nditer
to set up the required sum-of-products calculation. With this code I can look at how the translation is done:
以__name__
块中的示例为例:
label_str, op_axes = parse_subscripts('ik,kj->ij', Labels([A.ndim,B.ndim]))
print op_axes
# [[0, -1, 1], [-1, 1, 0], [0, 1, -1]] fine
# map (4,newaxis,3)(newaxis,3,2)->(4,2,newaxis)
print sum_of_prod([A,B],op_axes)
您的示例具有完整的诊断输出是
Your example, with full diagnostic output is
In [275]: einsum_py.parse_subscripts('ijk,jklm->ijklm',einsum_py.Labels([3,4]))
jklm
{'counts': {105: 1, 106: 2, 107: 2, 108: 1, 109: 1},
'strides': [],
'num_labels': 5,
'min_label': 105,
'nop': 2,
'ndims': [3, 4],
'ndim_broadcast': 0,
'shapes': [],
'max_label': 109}
[('ijk', [105, 106, 107], 'NONE'),
('jklm', [106, 107, 108, 109], 'NONE')]
('ijklm', [105, 106, 107, 108, 109], 'NONE')
iter labels: [105, 106, 107, 108, 109],'ijklm'
op_axes [[0, 1, 2, -1, -1], [-1, 0, 1, 2, 3], [0, 1, 2, 3, 4]]
Out[275]:
(<einsum_py.Labels at 0xb4f80cac>,
[[0, 1, 2, -1, -1], [-1, 0, 1, 2, 3], [0, 1, 2, 3, 4]])
使用'ajk,jkzZ->ajkzZ'
会更改标签,但会导致相同的op_axes
.
Using 'ajk,jkzZ->ajkzZ'
changes labels, but results in the same op_axes
.
这是翻译功能的初稿.它应该适用于(可哈希项的)列表的任何列表:
Here is a first draft of a translation function. It should work for any list of lists (of hashable items):
def translate(ll):
mset=set()
for i in ll:
mset.update(i)
dd={k:v for v,k in enumerate(mset)}
x=[''.join([chr(dd[i]+97) for i in l]) for l in ll]
# ['cdb', 'dbea', 'cdbea']
y=','.join(x[:-1])+'->'+x[-1]
# 'cdb,dbea->cdbea'
In [377]: A=np.ones((3,1,2),int)
In [378]: B=np.ones((1,2,4,3),int)
In [380]: ll=[list(i) for i in ['ijk','jklm','ijklm']]
In [381]: y=translate(ll)
In [382]: y
Out[382]: 'cdb,dbea->cdbea'
In [383]: np.einsum(y,A,B).shape
Out[383]: (3, 1, 2, 4, 3)
使用set
映射索引对象意味着最后的索引字符是无序的.只要您指定不应该成为问题的RHS.我也忽略了ellipsis
.
The use of set
to map index objects means that the final indexing characters are unordered. As long as you specify the RHS that shouldn't be an issue. Also I ignored ellipsis
.
=================
=================
einsum
输入的列表版本将转换为einsum_list_to_subscripts()
(在numpy/core/src/multiarray/multiarraymodule.c
中)的下标字符串版本.它将ELLIPSIS
替换为'...'.如果( s < 0 || s > 2*26)
其中s
是这些子列表之一中的数字,则会引发[0,52]错误消息.并使用将s
转换为字符串
The list version of einsum
input is converted to the subscript string version in einsum_list_to_subscripts()
(in numpy/core/src/multiarray/multiarraymodule.c
). It replace ELLIPSIS
with '...'. It raised the [0,52] error message if ( s < 0 || s > 2*26)
where s
is a number in one of those sublists. And converts s
to string with
if (s < 26) {
subscripts[subindex++] = 'A' + s;
}
else {
subscripts[subindex++] = 'a' + s;
但是第二种情况似乎不起作用;我收到类似26的错误消息.
But it looks like the 2nd case is not working; I get errors like for 26.
ValueError: invalid subscript '{' in einstein sum subscripts string, subscripts must be letters
如果s>26
,则'a'+s
是错误的:
In [424]: ''.join([chr(ord('A')+i) for i in range(0,26)])
Out[424]: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
In [425]: ''.join([chr(ord('a')+i) for i in range(0,26)])
Out[425]: 'abcdefghijklmnopqrstuvwxyz'
In [435]: ''.join([chr(ord('a')+i) for i in range(26,52)])
Out[435]: '{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94'
'a'+s
是错误的;应该是:
In [436]: ''.join([chr(ord('a')+i-26) for i in range(26,52)])
Out[436]: 'abcdefghijklmnopqrstuvwxyz'
我提交了 https://github.com/numpy/numpy/issues/7741
一直存在此错误表明子列表格式并不常见,并且在该列表中使用大数字的频率更低.
The existence of this bug after all this time indicates that the sublist format is not common, and that using large numbers in that list is even less frequent.
这篇关于我可以在numpy.einsum中使用超过26个字母吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!