问题描述
我将所有用户的帖子存储在表中.我想从用户关注的所有用户那里检索帖子.
I am storing posts from all users in table. I want to retrieve post from all users the user is following.
CREATE TABLE posts (
userid int,
time timestamp,
id uuid,
content text,
PRIMARY KEY (userid, time)
)WITH CLUSTERING ORDER BY (time DESC)
我在另一个表中有关于所有用户关注谁的数据
I have the data about who all user follows in another table
CREATE TABLE follow (
userid int,
who_follow_me set<int>,
who_i_follow set<int>,
PRIMARY KEY ((userid))
)
我正在查询
select * from posts where userid in(1,2,3,4....n);
2 个问题:
- 为什么我仍然以随机顺序获取数据,尽管帖子中指定了 CLUSTERING ORDER BY.?
- 模型是否正确以最佳地满足查询(用户可以有 n 个关注者)?
我使用的是 Cassandra 2.0.10.
I am using Cassandra 2.0.10.
推荐答案
"为什么我在帖子中指定了 CLUSTERING ORDER BY,但仍然以随机顺序获取数据?"
"why I still get data in random order, though CLUSTERING ORDER BY is specified in posts?"
这是因为 ORDER BY
仅适用于特定分区键内的行.因此,在您的情况下,如果您想像这样查看特定用户的所有帖子:
This is because ORDER BY
only works for rows within a particular partitioning key. So in your case, if you wanted to see all of the posts for a specific user like this:
SELECT * FROM posts WHERE userid=1;
这会返回按 time
排序的结果,因为 userid=1
分区键中的所有行都将被它聚集在一起.
That return your results ordered by time
, as all of the rows within the userid=1
partitioning key would be clustered by it.
模型是否正确以最佳地满足查询(用户可以有 n 个关注者)?"
"Is model correct to satisfy the query optimally (user can have n number of followers)?"
它会起作用,只要您不关心获取按时间戳排序的结果.为了能够查询按时间排序的所有用户的帖子,您需要提出不同的分区键.在不太了解您的应用程序的情况下,您可以使用像 GROUP
这样的列(例如)并对其进行分区.
It will work, as long as you don't care about getting the results ordered by timestamp. To be able to query posts for all users ordered by time, you would need to come up with a different partitioning key. Without knowing too much about your application, you could use a column like GROUP
(for instance) and partition on that.
假设您将所有用户平均分配到八个组:A、B、C、D、E、F、G 和 H.假设您的表设计更改如下:
So let's say that you evenly assign all of your users to eight groups: A, B, C, D, E, F, G and H. Let's say your table design changed like this:
CREATE TABLE posts (
group text,
userid int,
time timestamp,
id uuid,
content text,
PRIMARY KEY (group, time, userid)
)WITH CLUSTERING ORDER BY (time DESC)
然后您可以像这样查询 B 组所有用户的所有帖子:
You could then query all posts for all users for group B like this:
SELECT * FROM posts WHERE group='B';
这将为您提供 B 组中所有用户的所有帖子,按时间排序.所以基本上,为了让您的查询按时间对帖子进行适当排序,您需要将帖子数据分区为用户 ID 以外的其他内容.
That would give you all of the posts for all of the users in group B, ordered by time. So basically, for your query to order the posts appropriately by time, you need to partition your post data on something other than userid.
编辑:
PRIMARY KEY (userid, follows)) WITH CLUSTERING ORDER BY (created DESC);
那是行不通的.事实上,这应该会产生以下错误:
That's not going to work. In fact, that should produce the following error:
code=2200 [无效查询] message="缺少列跟随的 CLUSTERING ORDER"
即使您确实在 CLUSTERING ORDER
子句中添加了 follows
,您也会看到:
And even if you did add follows
to your CLUSTERING ORDER
clause, you would see this:
code=2200 [Invalid query] message="CLUSTERING ORDER 指令中只能定义聚集键列"
CLUSTERING ORDER
子句只能用于聚类列,在这种情况下,只有 follows
列.更改您的 PRIMARY KEY 定义以在 follows
(ASC) 和 created
(DESC) 上聚类.我已经对此进行了测试,并插入了一些示例数据,可以看到此查询有效:
The CLUSTERING ORDER
clause can only be used on the clustering column(s), which in this case, is only the follows
column. Alter your PRIMARY KEY definition to cluster on follows
(ASC) and created
(DESC). I have tested this, and inserted some sample data, and can see that this query works:
aploetz@cqlsh:stackoverflow> SELECT * FROM posts WHERE userid=2 AND follows=1;
userid | follows | created | id
--------+---------+--------------------------+--------------------------------------
2 | 1 | 2015-01-25 13:27:00-0600 | 559cda12-8fe7-45d3-9a61-7ddd2119fcda
2 | 1 | 2015-01-25 13:26:00-0600 | 64b390ba-a323-4c71-baa8-e247a8bc9cdf
2 | 1 | 2015-01-25 13:24:00-0600 | 1b325b66-8ae5-4a2e-a33d-ee9b5ad464b4
(3 rows)
不过,如果您只想通过 userid
查询,您可以看到所有关注者的帖子.但在这种情况下,帖子将只在每个关注者 ID 内排序,如下所示:
Although, if you want to query by just userid
you can see posts from all of your followers. But in that case, the posts will only be ordered within each followerid, like this:
aploetz@cqlsh:stackoverflow> SELECT * FROM posts WHERE userid=2;
userid | follows | created | id
--------+---------+--------------------------+--------------------------------------
2 | 0 | 2015-01-25 13:28:00-0600 | 94da27d0-e91f-4c1f-88f2-5a4bbc4a0096
2 | 0 | 2015-01-25 13:23:00-0600 | 798053d3-f1c4-4c1d-a79d-d0faff10a5fb
2 | 1 | 2015-01-25 13:27:00-0600 | 559cda12-8fe7-45d3-9a61-7ddd2119fcda
2 | 1 | 2015-01-25 13:26:00-0600 | 64b390ba-a323-4c71-baa8-e247a8bc9cdf
2 | 1 | 2015-01-25 13:24:00-0600 | 1b325b66-8ae5-4a2e-a33d-ee9b5ad464b4
(5 rows)
这篇关于尽管 WITH CLUSTERING ORDER BY 查询结果未排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!