python - 如何在第1列上对numpy searchsorted进行二等分，并在第2列中获得最小值

所以我有一个2列numpy整数数组，比如：

tarray = array([[ 368,  322],
       [ 433,  420],
       [ 451,  412],
       [ 480,  440],
       [ 517,  475],
       [ 541,  503],
       [ 578,  537],
       [ 607,  567],
       [ 637,  599],
       [ 666,  628],
       [ 696,  660],
       [ 726,  687],
       [ 756,  717],
       [ 785,  747],
       [ 815,  779],
       [ 845,  807],
       [ 874,  837],
       [ 905,  867],
       [ 934,  898],
       [ 969,  928],
       [ 994,  957],
       [1027,  987],
       [1057, 1017],
       [1086, 1047],
       [1117, 1079],
       [1148, 1109],
       [1177, 1137],
       [1213, 1167],
       [1237, 1197],
       [1273, 1227],
       [1299, 1261],
       [1333, 1287],
       [1357, 1317],
       [1393, 1347],
       [1416, 1377]])

我正在使用np.searchsorted将值的上下范围对分到列0中，也就是说可以两次对分，例如241361对分到数组中。

ranges = [array([241, 290, 350, 420, 540, 660, 780, 900]),
 array([ 361,  410,  470,  540,  660,  780,  900, 1020])]

例如：np.searchsorted（tarray[：，0]，ranges）
这将导致：

array([[ 0,  0,  0,  1,  5,  9, 13, 17],
       [ 0,  1,  3,  5,  9, 13, 17, 21]])

其中两个结果数组中的每个位置都是值的范围。然后我要做的是在结果切片的第1列中获取最小值的位置。例如，这里是我在Python中通过迭代（如果searchsorted的结果是2列数组“f”）简单地说的意思：

f = array([[ 0,  0,  0,  1,  5,  9, 13, 17],
       [ 0,  1,  3,  5,  9, 13, 17, 21]])

for i,(x,y) in enumerate(zip(*f)):
    if y - x:
        print ranges[1][i], tarray[x:y]

结果是：

410 [[368 322]]
470 [[368 322]
 [433 420]
 [451 412]]
540 [[433 420]
 [451 412]
 [480 440]
 [517 475]]
660 [[541 503]
 [578 537]
 [607 567]
 [637 599]]
780 [[666 628]
 [696 660]
 [726 687]
 [756 717]]
900 [[785 747]
 [815 779]
 [845 807]
 [874 837]]
1020 [[905 867]
 [934 898]
 [969 928]
 [994 957]]

现在来解释我想要的：在切片范围内，我想要列1中具有最小值的行。

e.g 540 [[433 420]
 [451 412]
 [480 440]
 [517 475]]

我希望最终结果是412（如[451 412]）
例如

for i,(x,y) in enumerate(zip(*f)):
    if y - x:
        print ranges[1][i], tarray[:,1:2][x:y].min()

410 322
470 322
540 412
660 503
780 628
900 747
1020 867

基本上我想把它矢量化，这样我就可以得到一个数组，而不需要迭代，因为它不符合我的需要。我想要列1中的最小值，用于列0上的值的等分范围。
我希望我说清楚了！

最佳答案

使用numpy_indexed包（免责声明：我是它的作者）似乎可以实现您的预期目标：

import numpy_indexed as npi
# to vectorize the concatenation of the slice ranges, we construct all indices implied in the slicing
counts = f[1] - f[0]
idx = np.ones(counts.sum(), dtype=np.int)
idx[np.cumsum(counts)[:-1]] -= counts[:-1]
tidx = np.cumsum(idx) - 1 + np.repeat(f[0], counts)

# combined with a unique label tagging the output of each slice range, this allows us to use grouping to find the minimum in each group
label = np.repeat(np.arange(len(f.T)), counts)
subtarray = tarray[tidx]
ridx, sidx = npi.group_by(label).argmin(subtarray[:, 0])

print(ranges[1][ridx])
print(subtarray[sidx, 1])