问题描述
我有一个非常大的数组,可以使用Apache Madlib计算得出,我想对该2d数组中的每个单个数组应用一个运算。
I have a really large array that have I computed with Apache Madlib and I would like to apply an operation to each single array in that 2d array.
我有在中找到了可以帮助我取消嵌套的代码。但是,在这个非常大的2d数组(150,000 + 1d float数组)上,代码的速度非常慢。尽管 unnest()
只需要几秒钟即可运行,即使等待几分钟后代码仍未完成。
I have found code that can help me unnest it from this related answer. However, the code is miserably slow on this really large 2d array (150,000+ 1d float arrays). While unnest()
only takes a few seconds to run, even after waiting for several minutes the code has not completed.
当然,必须有一种更快的方法将大型2d阵列嵌套成较小的1d阵列吗?如果该解决方案使用Apache Madlib,则有加分。我确实在文档 deconstruct_2d_array
中发现了一根引线,但是,当我尝试在矩阵上调用该函数时,它失败并出现以下错误:
Surely, there must be a faster way to unnest the large 2d array into smaller 1d arrays? Bonus point if that solution uses Apache Madlib. I did find one lead buried in the documentation called deconstruct_2d_array
, however, when I try to call that function on the matrix, it fails with the following error:
推荐答案
您在我的旧答案中发现,不适用于大型阵列。我从没想过数组的大小,它应该应该是一个集合(一个表)。
The function you found in my old answer does not scale well for big arrays. I never thought of arrays your size, which should probably be a set (a table) instead.
plpgsql函数可能会替换引用的答案。需要Postgres 9.1或更高版本。
Be that as it may, this plpgsql function replaces the one in the referenced answer. Requires Postgres 9.1 or later.
CREATE OR REPLACE FUNCTION unnest_2d_1d(ANYARRAY, OUT a ANYARRAY)
RETURNS SETOF ANYARRAY AS
$func$
BEGIN
FOREACH a SLICE 1 IN ARRAY $1 LOOP
RETURN NEXT;
END LOOP;
END
$func$ LANGUAGE plpgsql IMMUTABLE STRICT;
在Postgres 9.6中的大型2d阵列上进行测试时,速度提高了40倍。
40x faster in my test on a big 2d-array in Postgres 9.6.
STRICT
避免NULL输入出现异常(如):
STRICT
to avoid an exception for NULL input (as commented by IamIC):
这篇关于如何在PostgreSQL中将2d数组快速嵌套到1d数组中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!