有多个 sortkey 列是什么意思?

本文介绍了有多个 sortkey 列是什么意思?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Redshift 允许将多个列指定为 SORTKEY 列，但大多数最佳实践文档的编写方式就好像只有一个 SORTKEY.

Redshift allows designating multiple columns as SORTKEY columns, but most of the best-practices documentation is written as if there were only a single SORTKEY.

如果我用 SORTKEY (COL1, COL2) 创建一个表，这是否意味着所有列都按 COL1 排序，然后按 COL2 排序?或者，因为它是一个列式存储，所以每一列都以不同的顺序存储?IE.COL1按COL1顺序，COL2按COL2顺序，其他列无序?

If I create a table with SORTKEY (COL1, COL2), does that mean that all columns are stored sorted by COL1, then COL2? Or maybe, since it is a columnar store, each column gets stored in a different order? I.e. COL1 in COL1 order, COL2 in COL2 order, and the other columns unordered?

我的情况是我有一个表(其中包括)一个 type_id 和一个时间戳列.数据大致按时间戳顺序到达.大多数查询都受到 type_id 和时间戳的连接/限制.通常 type_id 子句更具体，这意味着通过查看 type_id 子句比查看时间戳子句可以排除更大比例的行.由于这个原因，type_id 是 DISTKEY.我试图了解 SORTKEY (type_id)、SORTKEY (stamp)、SORTKEY (type_id,stamp)、?无论如何......)，这意味着通过 stamp 过滤不会消除那么多行.所以声明第二个排序键更有意义.然而，这比其他方式效率低，因为提前消除行会更便宜.如果您有时按 stamp 而不是按 type_id 过滤，那么这样做可能是有意义的.

If COL1 is not highly selective like your stamp (which is a bit weird btw; I would have expected it to be more selective than type_id? Anyways..), it means that filtering by stamp won't eliminate that much rows. So it makes more sense to declare a second sort key. However, this is less efficient than the other way around as eliminating rows earlier would be cheaper. If you sometimes filter by stamp but not by type_id, it may make sense to do this though.

这篇关于有多个 sortkey 列是什么意思?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

sortKey

有多个 sortkey 列是什么意思?

问题描述