我正在以多个时间分辨率跟踪多个离散时间序列,从而得到一个SxRxB矩阵,其中S是时间序列的数量,R是不同分辨率的数量,B是缓冲区,即每个序列记住多少个值。每个系列都是离散的,并使用有限范围的自然数表示其值。我将在这里称这些“符号”。
对于每个系列,我要计算在所有测量中,先前测量的任何符号直接出现在当前测量的任何符号之前的频率。我已经通过如下所示的for循环解决了这个问题,但是出于明显的原因,我想对其进行向量化。
我不确定我的数据结构方式是否有效,因此我愿意在那里提出建议。我认为尤其是比率矩阵可以做的不同。
提前致谢!
def supports_loop(data, num_series, resolutions, buffer_size, vocab_size):
# For small test matrices we can calculate the complete matrix without problems
indices = []
indices.append(xrange(num_series))
indices.append(xrange(vocab_size))
indices.append(xrange(num_series))
indices.append(xrange(vocab_size))
indices.append(xrange(resolutions))
# This is huge! :/
# dimensions:
# series and value for which we calculate,
# series and value which precedes that measurement,
# resolution
ratios = np.full((num_series, vocab_size, num_series, vocab_size, resolutions), 0.0)
for idx in itertools.product(*indices):
s0, v0 = idx[0],idx[1] # the series and symbol for which we calculate
s1, v1 = idx[2],idx[3] # the series and symbol which should precede the we're calculating for
res = idx[4]
# Find the positions where s0==v0
found0 = np.where(data[s0, res, :] == v0)[0]
if found0.size == 0:
continue
#print('found {}={} at {}'.format(s0, v0, found0))
# Check how often s1==v1 right before s0==v0
candidates = (s1, res, (found0 - 1 + buffer_size) % buffer_size)
found01 = np.count_nonzero(data[candidates] == v1)
if found01 == 0:
continue
print('found {}={} following {}={} at {}'.format(s0, v0, s1, v1, found01))
# total01 = number of positions where either s0 or s1 is defined (i.e. >=0)
total01 = len(np.argwhere((data[s0, res, :] >= 0) & (data[s1, res, :] >= 0)))
ratio = (float(found01) / total01) if total01 > 0 else 0.0
ratios[idx] = ratio
return ratios
def stackoverflow_example(fnc):
data = np.array([
[[0, 0, 1], # series 0, resolution 0
[1, 3, 2]], # series 0, resolution 1
[[2, 1, 2], # series 1, resolution 0
[3, 3, 3]], # series 1, resoltuion 1
])
num_series = data.shape[0]
resolutions = data.shape[1]
buffer_size = data.shape[2]
vocab_size = np.max(data)+1
ratios = fnc(data, num_series, resolutions, buffer_size, vocab_size)
coordinates = np.argwhere(ratios > 0.0)
nz_values = ratios[ratios > 0.0]
print(np.hstack((coordinates, nz_values[:,None])))
print('0/0 precedes 0/0 in 1 out of 3 cases: {}'.format(np.isclose(ratios[0,0,0,0,0], 1.0/3.0)))
print('1/2 precedes 0/0 in 2 out of 3 cases: {}'.format(np.isclose(ratios[0,0,1,2,0], 2.0/3.0)))
预期的输出(21对,坐标为5列,后跟找到的计数):
[[0 0 0 0 0 1]
[0 0 0 1 0 1]
[0 0 1 2 0 2]
[0 1 0 0 0 1]
[0 1 0 2 1 1]
[0 1 1 1 0 1]
[0 1 1 3 1 1]
[0 2 0 3 1 1]
[0 2 1 3 1 1]
[0 3 0 1 1 1]
[0 3 1 3 1 1]
[1 1 0 0 0 1]
[1 1 1 2 0 1]
[1 2 0 0 0 1]
[1 2 0 1 0 1]
[1 2 1 1 0 1]
[1 2 1 2 0 1]
[1 3 0 1 1 1]
[1 3 0 2 1 1]
[1 3 0 3 1 1]
[1 3 1 3 1 3]]
在上面的示例中,在三分之二的情况下,序列0的0跟随序列1的2(因为缓冲区是圆形的),因此[0,0,1,2,0]的比率约为0.6666。同样,系列0,值0在三种情况中的一种也跟随其自身,因此[0,0,0,0,0]处的比率约为0.3333。还有其他一些> 0.0。
我正在两个数据集上测试每个答案:一个很小的数据集(如上所示)和一个更现实的数据集(100系列,5种分辨率,每个系列10个值,50个符号)。
结果
Answer Time (tiny) Time (huge) All pairs found (tiny=21)
-----------------------------------------------------------------------
Baseline ~1ms ~675s (!) Yes
Saedeas ~0.13ms ~1.4ms No (!)
Saedeas2 ~0.20ms ~4.0ms Yes, +cross resolutions
Elliot_1 ~0.70ms ~100s (!) Yes
Elliot_2 ~1ms ~21s (!) Yes
Kuppern_1 ~0.39ms ~2.4s (!) Yes
Kuppern_2 ~0.18ms ~28ms Yes
Kuppern_3 ~0.19ms ~24ms Yes
David ~0.21ms ~27ms Yes
Saedeas第二种方法无疑是赢家!非常感谢你们,大家:)
最佳答案
首先,您没有显式地嵌套for循环,这会给您带来一点麻烦。您最终需要重复很多努力,却没有节省任何内存。当循环嵌套时,您可以将某些计算从一个级别移到另一级别,并找出可以对哪些内部循环进行矢量化处理。
def supports_5_loop(data, num_series, resolutions, buffer_size, vocab_size):
ratios = np.full((num_series, vocab_size, num_series, vocab_size, resolutions), 0.0)
for res in xrange(resolutions):
for s0 in xrange(num_series):
# Find the positions where s0==v0
for v0 in np.unique(data[s0, res]):
# only need to find indices once for each series and value
found0 = np.where(data[s0, res, :] == v0)[0]
for s1 in xrange(num_series):
# Check how often s1==v1 right before s0==v0
candidates = (s1, res, (found0 - 1 + buffer_size) % buffer_size)
total01 = np.logical_or(data[s0, res, :] >= 0, data[s1, res, :] >= 0).sum()
# can skip inner loops if there are no candidates
if total01 == 0:
continue
for v1 in xrange(vocab_size):
found01 = np.count_nonzero(data[candidates] == v1)
if found01 == 0:
continue
ratio = (float(found01) / total01)
ratios[(s0, v0, s1, v1, res)] = ratio
return ratios
您会发现在计时中,大多数提速速度来自不重复的努力。
建立嵌套结构后,就可以开始研究向量化和其他优化。
def supports_4_loop(data, num_series, resolutions, buffer_size, vocab_size):
# For small test matrices we can calculate the complete matrix without problems
# This is huge! :/
# dimensions:
# series and value for which we calculate,
# series and value which precedes that measurement,
# resolution
ratios = np.full((num_series, vocab_size, num_series, vocab_size, resolutions), 0.0)
for res in xrange(resolutions):
for s0 in xrange(num_series):
# find the counts where either s0 or s1 are present
total01 = np.logical_or(data[s0, res] >= 0,
data[:, res] >= 0).sum(axis=1)
s1s = np.where(total01)[0]
# Find the positions where s0==v0
v0s, counts = np.unique(data[s0, res], return_counts=True)
# sorting before searching will show gains as the datasets
# get larger
indarr = np.argsort(data[s0, res])
i0 = 0
for v0, count in itertools.izip(v0s, counts):
found0 = indarr[i0:i0+count]
i0 += count
for s1 in s1s:
candidates = data[(s1, res, (found0 - 1) % buffer_size)]
# can replace the innermost loop with numpy functions
v1s, counts = np.unique(candidates, return_counts=True)
ratios[s0, v0, s1, v1s, res] = counts / total01[s1]
return ratios
不幸的是,我只能对最内层的循环进行矢量化处理,而这只能使速度提高10%。在最内层循环之外,您不能保证所有向量的大小都相同,因此无法构建数组。
In [121]: (np.all(supports_loop(data, num_series, resolutions, buffer_size, vocab_size) == supports_5_loop(data, num_series, resolutions, buffer_size, vocab_size)))
Out[121]: True
In [122]: (np.all(supports_loop(data, num_series, resolutions, buffer_size, vocab_size) == supports_4_loop(data, num_series, resolutions, buffer_size, vocab_size)))
Out[122]: True
In [123]: %timeit(supports_loop(data, num_series, resolutions, buffer_size, vocab_size))
2.29 ms ± 73.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [124]: %timeit(supports_5_loop(data, num_series, resolutions, buffer_size, vocab_size))
949 µs ± 5.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [125]: %timeit(supports_4_loop(data, num_series, resolutions, buffer_size, vocab_size))
843 µs ± 3.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
关于python - 具有相互依存值的矩阵中的向量化计算,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/51194504/