本文介绍了由于主键约束,Cassandra CQL SELECT / DELETE问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要存储需要推送到Cassandra表中用户的新闻源页面的最新更新,以供稍后检索,我的表格模式如下:

I need to store latest updates that needs to be pushed to users' newsfeed page in Cassandra table for later retrieval and my table's schema is as follow:

CREATE TABLE newsfeed (user_name text,
                       post_id bigint,
                       post_type text,
                       favorited boolean,
                       shared boolean,
                       own boolean,
                       date timestamp,
       PRIMARY KEY (user_name,date,post_id,post_type) );

前三列(用户名,postid和posttype)组合将构建实际的主键的表,然而由于我想根据行的日期排序对这个表上的SELECT查询,我将日期列的主键字段作为第二条目(我必须这样做吗?) 。

The first three column (username, postid, and posttype) in combination will build the actual primary-key of the table, however since I wanted to ORDER the SELECT queries on this table based on "date"s of rows I placed the date-column into the primary key fields as the "second" entry (did I have to do this?).

当我想通过只给出user_name,post_id和post_type来删除一行时:

When I want to delete a row by giving only "user_name, post_id, and post_type" as follow:

 DELETE FROM newsfeed WHERE user_name='pooria' and post_id=36 and post_type='p';

我会收到以下错误:

Bad Request: Missing PRIMARY KEY part date since post_id is set

我需要日期列作为主键的一部分,因为我想在我的ORDER BY子句中使用它,另一方面,我必须删除一些行,而不知道他们的日期值!

I need the date-column to be part of the primary key since I want to use it in my ORDER BY clauses and on the other hand I have to delete some rows without knowing their "date" values!

那么如何解决这些问题在Cassandra?

So how such problems are tackled in Cassandra? should I be fixing my Data Model and have different schema for job?

推荐答案

DataStax的首席发言人Patrick McFadden发表了一篇文章,演示了一些数据模型时间序列建模模式。绝对是一个好阅读,并应该有一些帮助:。

DataStax's Chief Evangelist Patrick McFadden posted an article demonstrating a few time series modeling patterns. Definitely makes for a good read, and should be of some help to you: Getting Started with Time Series Data Modeling.

我认为你的表很好。虽然,如果您不能在查询中跳过主键组件,复合主键在Cassandra中的工作方式。所以如果你最终需要通过 user_name post_id 和/或 post_type (不含日期),您应该为该查询(不包括主键中的日期)创建一个表。

I think your table is just fine. Although, with the way that composite primary keys work in Cassandra, if you cannot skip primary key components in a query. So if you do end up needing to query data by user_name, post_id, and/or post_type differently (without date), you should create a table specifically for that query (which does not include date in the primary key).

但我会说,一般来说,创建一个表将处理常规删除操作不是一个好主意。事实上,我很确定已被列为Cassandra的反模式。数据真的不会从Cassandra中删除;它是墓碑。墓碑在压实时间被调和(假设已满足墓碑阈值时间),并且已知其中的太多会导致性能问题。

I will however say that in-general, creating a table which will process regular delete operations is not a good idea. In fact, I'm pretty sure that has been classified as a Cassandra "anti-pattern." Data really isn't deleted from Cassandra; it is tombstoned. Tombstones are reconciled at compaction time (assuming that the tombstone threshold time has been met), and having too many of them has been known to cause performance issues.

如果您阅读我上面链接的文章,去到名为时间序列模式3的部分。您会注意到 INSERT 语句使用 USING TTL 子句运行。这使得数据在几秒钟内生存,之后它将静静消失。例如,如果你想保持你的数据24小时(86400秒),你可以这样做:

If you read the article I linked above, go down to the section named "Time Series Pattern 3." You will notice that the INSERT statements are run with the USING TTL clause. This gives the data a time-to-live in seconds, after which it will "quietly disappear." For instance, if you wanted to keep your data around for 24 hours (86400 seconds) you could do something like this:

INSERT INTO newsfeed (...) VALUES (...) USING TTL 86400

使用TTL功能是 DELETE 的常规清洁的替代选择。

Using the TTL feature is a preferable alternative to regular cleansing by DELETE.

这篇关于由于主键约束,Cassandra CQL SELECT / DELETE问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 03:39