并避免新值和键之间的重叠

并避免新值和键之间的重叠

本文介绍了根据字典替换NumPy数组中的值,并避免新值和键之间的重叠的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想基于python中的以下字典替换2D numpy数组中的值:

I want to replace values in a 2D numpy array based on following dictionary in python:

code    region
334     0
4       22
8       31
12      16
16      17
24      27
28      18
32      21
36       1

我想在numpy 2D数组中找到与code匹配的单元格,并替换为region列中的相应值.问题在于这将导致用region = 16替换code = 12,并且在下一行中,所有值为16的单元格(包括刚刚被赋值为16的单元格)都将被替换为值17.我该如何预防?

I want to find cells in numpy 2D array which match code and replace by corresponding value in region column. The issue is that this will result in replacing code = 12 by region = 16 and in the next line, all cells with value of 16 (including the ones which just got assigned a value of 16) will be replaced by a value of 17. How do I prevent that?

推荐答案

这是基于 np.searchsorted 追溯数组中每个键的位置,然后替换,请原谅几乎是 sexist 函数在这里命名(尽管无济于事)-

Here's a vectorized one based on np.searchsorted to trace back the locations for each of those keys in the array and then replacing and please excuse the almost sexist function name here (couldn't help it though) -

def replace_with_dict(ar, dic):
    # Extract out keys and values
    k = np.array(list(dic.keys()))
    v = np.array(list(dic.values()))

    # Get argsort indices
    sidx = k.argsort()

    # Drop the magic bomb with searchsorted to get the corresponding
    # places for a in keys (using sorter since a is not necessarily sorted).
    # Then trace it back to original order with indexing into sidx
    # Finally index into values for desired output.
    return v[sidx[np.searchsorted(k,ar,sorter=sidx)]]

样品运行-

In [82]: dic ={334:0, 4:22, 8:31, 12:16, 16:17, 24:27, 28:18, 32:21, 36:1}
    ...:
    ...: np.random.seed(0)
    ...: a = np.random.choice(dic.keys(), 20)
    ...:

In [83]: a
Out[83]:
array([ 28,  16,  32,  32, 334,  32,  28,   4,   8, 334,  12,  36,  36,
        24,  12, 334, 334,  36,  24,  28])

In [84]: replace_with_dict(a, dic)
Out[84]:
array([18, 17, 21, 21,  0, 21, 18, 22, 31,  0, 16,  1,  1, 27, 16,  0,  0,
        1, 27, 18])

改进

对于大型数组,一种更快的方法是对值和键数组进行排序,然后在不使用sorter的情况下使用searchsorted,就像这样-

A faster one for big arrays would be sort the values and keys arrays and then use searchsorted without sorter, like so -

def replace_with_dict2(ar, dic):
    # Extract out keys and values
    k = np.array(list(dic.keys()))
    v = np.array(list(dic.values()))

    # Get argsort indices
    sidx = k.argsort()

    ks = k[sidx]
    vs = v[sidx]
    return vs[np.searchsorted(ks,ar)]

运行时测试-

In [91]: dic ={334:0, 4:22, 8:31, 12:16, 16:17, 24:27, 28:18, 32:21, 36:1}
    ...:
    ...: np.random.seed(0)
    ...: a = np.random.choice(dic.keys(), 20000)

In [92]: out1 = replace_with_dict(a, dic)
    ...: out2 = replace_with_dict2(a, dic)
    ...: print np.allclose(out1, out2)
True

In [93]: %timeit replace_with_dict(a, dic)
1000 loops, best of 3: 453 µs per loop

In [95]: %timeit replace_with_dict2(a, dic)
1000 loops, best of 3: 341 µs per loop

所有数组元素都不在字典中的一般情况

如果不能保证输入数组中的所有元素都在字典中,那么我们需要做更多的工作,如下所列-

If all elements in the input array are not guaranteed to be in the dictionary, we need a bit more work as listed below -

def replace_with_dict2_generic(ar, dic, assume_all_present=True):
    # Extract out keys and values
    k = np.array(list(dic.keys()))
    v = np.array(list(dic.values()))

    # Get argsort indices
    sidx = k.argsort()

    ks = k[sidx]
    vs = v[sidx]
    idx = np.searchsorted(ks,ar)

    if assume_all_present==0:
        idx[idx==len(vs)] = 0
        mask = ks[idx] == ar
        return np.where(mask, vs[idx], ar)
    else:
        return vs[idx]

样品运行-

In [163]: dic ={334:0, 4:22, 8:31, 12:16, 16:17, 24:27, 28:18, 32:21, 36:1}
     ...:
     ...: np.random.seed(0)
     ...: a = np.random.choice(dic.keys(), (20))
     ...: a[-1] = 400

In [165]: a
Out[165]:
array([ 28,  16,  32,  32, 334,  32,  28,   4,   8, 334,  12,  36,  36,
        24,  12, 334, 334,  36,  24, 400])

In [166]: replace_with_dict2_generic(a, dic, assume_all_present=False)
Out[166]:
array([ 18,  17,  21,  21,   0,  21,  18,  22,  31,   0,  16,   1,   1,
        27,  16,   0,   0,   1,  27, 400])

这篇关于根据字典替换NumPy数组中的值,并避免新值和键之间的重叠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 01:00