问题描述
在 Cassandra Wiki 中,据说每个分区有 20 亿个单元(行 x 列)
的限制.但我不清楚什么是分区?
In Cassandra Wiki, it is said that there is a limit of 2 billion cells (rows x columns)
per partition. But it is unclear to me what is a partition?
每个列族的每个节点是否有一个分区,这意味着列族的最大大小将是集群中20 亿个单元 * 节点数
.
Do we have one partition per node per column family, which would mean that the max size of a column family would be 2 billion cells * number of nodes
in the cluster.
或者 Cassandra 会创建尽可能多的分区来存储列族的所有数据吗?
Or will Cassandra create as much partitions as required to store all the data of a column family?
我正在开始一个新项目,所以我将使用 Cassandra 2.0.
I am starting a new project so I will use Cassandra 2.0.
推荐答案
随着 CQL3 的出现,术语与旧的节俭术语略有不同.
With the advent of CQL3 the terminology has changed slightly from the old thrift terms.
基本上
Create Table foo (a int , b int, c int, d int, PRIMARY KEY ((a,b),c))
将制作一个 CQL3 表.a 和 b 中的信息用于制作分区键,这描述了信息将驻留在哪个节点上.这就是 20 亿个单元格限制中所说的分区".
Will make a CQL3 table. The information in a and b is used to make the partition key, this describes which node the information will reside on. This is the 'partiton' talked about in the 2 billion cell limit.
在该分区内,信息将由 c 组织,称为集群键.a、b 和 c 一起定义了 d 的唯一值.在这种情况下,分区中的单元格数量将为 c * d.所以在这个例子中,对于任何给定的 a 和 b 对,c 和 d 只能有 20 亿种组合
Within that partition the information will be organized by c, known as the clustering key. Together a,b and c, define a unique value of d. In this case the number of cells in a partition would be c * d. So in this example for any given pair of a and b there can only be 2 billion combinations of c and d
因此,当您对数据建模时,您希望确保主键会有所不同,以便您的数据在 Cassandra 中随机分布.然后使用集群键来确保您的数据以您希望的方式可用.
So as you model your data you want to ensure that the primary key will vary so that your data will be randomly distributed across Cassandra. Then use clustering keys to ensure that your data is available in the way you want it.
观看此视频以了解有关 cassandra 中的 Datmodeling 的更多信息数据模型已死,数据模型万岁
Watch this video for more info on Datmodeling in cassandraThe Datamodel is Dead, Long live the datamodel
Create Table foo (a int , b int, c int, d int, e int, f int, PRIMARY KEY ((a,b),c,d))
分区将由 a 和 b 的组合唯一标识.
Partitions will be uniquely identified by a combination of a and b.
在分区内 c 和 d 将用于对分区内的单元格进行排序,因此布局将有点像:
Within a partition c and d will be used to order cells within the partition so the layout willlook a little like:
(a1,b1) --> [c1,d1 : e1], [c1,d1 :f1], [c1,d2 : e2] ....
所以在这个例子中你可以有 20 亿个单元格,每个单元格包含:
So in this example you can have 2 Billion cells with each cell containing:
- c 的值
- d 的值
- e 或 f 的值
所以 20 亿的限制是指 (c,d,e)
和 (c,d,f)
的唯一元组的总和.
So the 2 billion limit refers to the sum of unique tuples of (c,d,e)
and (c,d,f)
.
这篇关于Cassandra 的每个分区限制为 20 亿个单元,但什么是分区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!