本文介绍了有没有办法重塑不保持原始大小的数组(或方便的解决方法)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 作为一个简化的例子,假设我有一个由40个排序值组成的数据集。该示例的值都是整数,尽管实际数据集不一定是这样。As a simplified example, suppose I have a dataset composed of 40 sorted values. The values of this example are all integers, though this is not necessarily the case for the actual dataset.import numpy as npdata = np.linspace(1,40,40)我试图找到数据集中某些窗口大小的最大值。计算窗口大小的公式产生最好用数组执行的模式(在我看来)。为简单起见,我们假设表示窗口大小的索引是一个列表 [1,2,3,4,5] ;这对应于窗口大小 [2,4,8,16,32] (模式是 2 **索引)。I am trying to find the maximum value inside the dataset for certain window sizes. The formula to compute the window sizes yields a pattern that is best executed with arrays (in my opinion). For simplicity sake, let's say the indices denoting the window sizes are a list [1,2,3,4,5]; this corresponds to window sizes of [2,4,8,16,32] (the pattern is 2**index).## this code looks long because I've provided docstrings## just in case the explanation was uncleardef shapeshifter(num_col, my_array=data): """ This function reshapes an array to have 'num_col' columns, where 'num_col' corresponds to index. """ return my_array.reshape(-1, num_col)def looper(num_col, my_array=data): """ This function calls 'shapeshifter' and returns a list of the MAXimum values of each row in 'my_array' for 'num_col' columns. The length of each row (or the number of columns per row if you prefer) denotes the size of each window. EX: num_col = 2 ==> window_size = 2 ==> check max( data[1], data[2] ), max( data[3], data[4] ), max( data[5], data[6] ), . . . max( data[39], data[40] ) for k rows, where k = len(my_array)//num_col """ my_array = shapeshifter(num_col=num_col, my_array=data) rows = [my_array[index] for index in range(len(my_array))] res = [] for index in range(len(rows)): res.append( max(rows[index]) ) return res到目前为止,代码很好,我检查了以下内容:So far, the code is fine. I checked it with the following:check1 = looper(2)check2 = looper(4)print(check1)>> [2.0, 4.0, ..., 38.0, 40.0] print(len(check1))>> 20print(check2)>> [4.0, 8.0, ..., 36.0, 40.0] print(len(check2))>> 10到目前为止这么好,现在这里是我的问题。 So far so good. Now here is my problem.def metalooper(col_ls, my_array=data): """ This function calls 'looper' - which calls 'shapeshifter' - for every 'col' in 'col_ls'. EX: j_list = [1,2,3,4,5] ==> col_ls = [2,4,8,16,32] ==> looper(2), looper(4), looper(8), ..., looper(32) ==> shapeshifter(2), shapeshifter(4), shapeshifter(8), ..., shapeshifter(32) such that looper(2^j) ==> shapeshifter(2^j) for j in j_list """ res = [] for col in col_ls: res.append(looper(num_col=col)) return resj_list = [2,4,8,16,32]check3 = metalooper(j_list)运行上面的代码提供了这个错误:Running the code above provides this error:ValueError: total size of new array must be unchanged使用 40个数据点,数组可以重新形成 2列 20行或 4列 10行或 8列 5行,但在 16列中,无法重新整形数组,因为 40/16≠整数。我相信这是我的代码的问题,但我不知道如何解决它。With 40 data points, the array can be reshaped into 2 columns of 20 rows, or 4 columns of 10 rows, or 8 columns of 5 rows, BUT at 16 columns, the array cannot be reshaped without clipping data since 40/16 ≠ integer. I believe this is the problem with my code, but I do not know how to fix it. 我希望有一种方式以排除每个行中不符合每个窗口的最后一个值,如果不是可能的,我希望我可以追加零来填充保留原始数组大小的条目,这样我可以在之后删除零。或者甚至一些复杂的如果 - 尝试 - break block 。对于这个问题,有什么方法?I am hoping there is a way to cutoff the last values in each row that do not fit in each window. If this is not possible, I am hoping I can append zeroes to fill the entries that maintain the size of the original array, so that I can remove the zeroes after. Or maybe even some complicated if - try - break block. What are some ways around this problem?推荐答案我认为这会给你一个想要的一步:I think this will give you what you want in one step:def windowFunc(a, window, f = np.max): return np.array([f(i) for i in np.split(a, range(window, a.size, window))])默认 f ,这将为您的窗口提供最大数量的数组。 with default f, that will give you a array of maximums for your windows. 一般来说,使用 np.split 和范围,这将让你分裂成(可能是破烂的)数组列表:Generally, using np.split and range, this will let you split into a (possibly ragged) list of arrays:def shapeshifter(num_col, my_array=data): return np.split(my_array, range(num_col, my_array.size, num_col))您需要一个数组列表,因为2D数组不能被破解(每行需要相同数量的列)You need a list of arrays because a 2D array can't be ragged (every row needs the same number of columns)如果您真的想要使用零填充,可以使用 np.lib.pad :If you really want to pad with zeros, you can use np.lib.pad:def shapeshifter(num_col, my_array=data): return np.lib.pad(my_array, (0, num_col - my.array.size % num_col), 'constant', constant_values = 0).reshape(-1, num_col) 警告: 在技术上也可以使用,例如 a.resize(32,2) ,它将创建一个用零填充的 ndArray (如你所要求的)。 It is also technically possible to use, for example, a.resize(32,2) which will create an ndArray padded with zeros (as you requested). But there are some big caveats: 您需要计算第二个轴,因为 -1 技巧不适用于 resize 。 如果原始数组 a 被其他任何参考, a.resize 将失败并出现以下错误:You would need to calculate the second axis because -1 tricks don't work with resize. If the original array a is referenced by anything else, a.resize will fail with the following error: ValueError: cannot resize an array that references or is referencedby another array in this way. Use the resize function resize 函数(即 np.resize(a))不等同于 a.resize ,而不是填充零点将循环回到开始。The resize function (i.e. np.resize(a)) is not equivalent to a.resize, as instead of padding with zeros it will loop back to the beginning.由于您似乎想要引用 a 通过多个窗口, a.resize 不是很有用。但是这是一个很容易陷入的兔子洞。Since you seem to want to reference a by a number of windows, a.resize isn't very useful. But it's a rabbit hole that's easy to fall into. 编辑: 通过列表循环缓慢。如果你的输入很长,而且窗口很小,那么上面的 windowFunc 将会在中阻止循环。这应该更有效:Looping through a list is slow. If your input is long and windows are small, the windowFunc above will bog down in the for loops. This should be more efficient:def windowFunc2(a, window, f = np.max): tail = - (a.size % window) if tail == 0: return f(a.reshape(-1, window), axis = -1) else: body = a[:tail].reshape(-1, window) return np.r_[f(body, axis = -1), f(a[tail:])] 这篇关于有没有办法重塑不保持原始大小的数组(或方便的解决方法)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
11-02 11:01