问题描述
我正在学习Cassandra,从v3.8开始.我的示例键空间/表看起来像这样
I'm learning Cassandra, started off with v3.8. My sample keyspace/table looks like this
CREATE TABLE digital.usage (
provider decimal,
deviceid text,
date text,
hours varint,
app text,
flat text,
usage decimal,
PRIMARY KEY ((provider, deviceid), date, hours)
) WITH CLUSTERING ORDER BY (date ASC, hours ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
使用以分区键为provider和deviceId的复合 PRIMARY KEY
,以便在群集节点之间完成唯一性和分布.然后,聚类键是日期和小时.
Using a composite PRIMARY KEY
with partition key as provider and deviceId, so that the uniqueness and distribution is done across the cluster nodes. Then the clustering keys are date and hours.
我的观察很少:
1)对于 PRIMARY KEY((提供者,设备ID),日期,小时)
,在为小时字段插入多个条目时,仅记录最新,而前一个消失.
1) For a PRIMARY KEY((provider, deviceid), date, hours)
, while inserting multiple entries for hours field, only latest is logged and the previous are disappeared.
2)对于 PRIMARY KEY((提供者,设备ID),日期)
,当在同一日期字段中插入多个条目时,仅记录最新,而前一个消失.
2) For a PRIMARY KEY((provider, deviceid), date)
, while inserting multiple entries for same date field, only latest is logged and the previous are disappeared.
尽管我对上述(第1点)的行为感到满意,但还是想知道背景中发生了什么.我是否需要了解有关聚类顺序键的更多信息?
Though i'm happy with above(point-1) behaviour, want to know whats happening in the background. Do I have to understand more about the clustering order keys?
推荐答案
主键是唯一的.
如果在PRIMARY KEY中插入重复值,大多数RDBMS都会引发错误.
Most of RDBMS throws error if you insert duplicate value in PRIMARY KEY.
Cassandra在写之前不执行读操作.它创建具有最新时间戳的新记录版本.当您在主键中为列插入具有相同值的数据时,将创建具有最新时间戳的新数据,同时仅返回具有最新时间戳的查询(SELECT)记录.
Cassandra does not do Read before Write. It creates a new version of record with latest timestamp. When you insert data with same values for columns in primary key, new data will be created with latest timestamp and while querying (SELECT) record with only latest timestamp is returned back.
示例:
PRIMARY KEY((provider, deviceid), date, hours)
Insert into digital.usage(provider, deviceid, date, hours,app,flat) values(1.0,'a','2017-07-27',1,"test","test")
---- This will create a new record with let's say timestamp as 1
Insert into digital.usage(provider, deviceid, date, hours,app,flat) values(1.0,'a','2017-07-27',1,"test1","test1")
---- This will create a new record with let's say timestamp as 2
SELECT app,flat FROM digital.usage WHERE provider=1.0 AND deviceid='a' AND date='2017-07-27' AND hours=1
Will give
------------
| app | flat |
|-----|------|
|test1|test1 |
------------
这篇关于插入查询替换Cassandra集群列中具有相同数据字段的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!