问题描述
我正在设计一个简单的消息传递模式,其中 thread 将所有在用户集合之间发送的消息分组.当我必须找到给定一组用户的现有线程时,我会陷入困境.
I'm designing a simple messaging schema where a thread groups all messages that are sent between a collection of users. I'm getting stuck when I have to find an existing thread given a set of users.
有两种发送消息的场景:
There are 2 scenarios for sending a message:
发送到线程::查看线程时,消息直接发送到该线程,因此线程ID是已知的. (不是问题)
Send To Thread: When viewing a thread, a message is sent directly to that thread, so the threadID is known. (not a problem)
发送给收件人:用户创建新邮件并从头开始指定一组收件人.我只想创建一个新线程,前提是这些用户之间不存在新线程,这就是我遇到的问题.我需要一个查询,该查询将在给定一组用户的情况下找到现有的threadID. ThreadMembers 表将用户映射到线程.这有可能吗?还是我需要修改表?
Send To Recipients: A user creates a new message and specifies a set of recipients from scratch. I only want to create a new thread if one doesn't already exist between these users, which is where I'm stuck. I need a query that will find an existing threadID given a set of users. The ThreadMembers table maps users to threads. Is this even possible? Or do I need to modify my tables?
我的桌子:
线程:
threadID(id)
lastSent(时间戳)
Thread:
threadID (id)
lastSent (timestamp)
ThreadMembers:
threadFK(线程的外键)
userFK(用户外键)
ThreadMembers:
threadFK (foreign key to thread)
userFK (foreign key to user)
消息:
threadFK(线程的外键)
senderFK(用户外键)
msgID(id)
msgDate(时间戳)
msgText(文本)
Messages:
threadFK (foreign key to thread)
senderFK (foreign key to user)
msgID (id)
msgDate (timestamp)
msgText (text)
非常感谢!
推荐答案
在尝试解释查询的过程中,我意识到它可能无法始终正常运行.因此,我回头想出了如何进行测试.我仍然对模式设置感到困惑-也就是说,这意味着无法将新用户添加到现有线程中,并且特定的一组用户只能在一个线程中进行对话-但纠正这一点很好查询.
I realized, in the course of attempting to explain the query, that it wouldn't always work correctly. So, I went back and figured out how to test this. I'm still bugged by the schema setup - namely, it implies that new users can't be added to an existing thread, and that a specific set of users will only be able to talk in one thread - but it was good to correct the query.
WITH Selected_Users(id) as (VALUES (@id1), (@id2), --etc--),
Threads(id) as (SELECT DISTINCT threadFk
FROM ThreadMembers as a
JOIN Selected_Users as b
ON b.id = a.userFk)
SELECT a.id
FROM Threads as a
WHERE NOT EXISTS (SELECT '1'
FROM ThreadMembers as b
LEFT JOIN Selected_Users as c
ON c.id = b.userFk
WHERE c.id IS NULL
AND b.threadFk = a.id)
AND NOT EXISTS (SELECT '1'
FROM Selected_Users as b
LEFT JOIN ThreadMembers as c
ON c.userFk = b.id
AND c.threadFk = a.id
WHERE c.userFk IS NULL)
该语句可能必须是动态的,以构建选定用户的列表,除非SQL Server有一种方法可以将列表作为主机变量提供(我知道DB2至少从iSeries可以做到).我没有完美的数据集可以对此进行测试,但是针对数百万行表(仅具有多对一关系),它几乎立即返回-我正在对此进行仅索引访问(提示)
The statement will likely have to be dynamic, to build the list of selected users, unless SQL Server has a way to provide a list as a host variable (I know DB2 does, at least from the iSeries). I don't have the perfect dataset to test this against, but against a multi-million row table (with only a many-one relationship), it returns almost instantly - I'm getting index-only access for this (hint hint).
说明:
WITH Selected_Users(id) as (VALUES (@id1), (@id2), --etc--),
此CTE正在构建用户列表,以便可以将其作为表引用.尽管可以将它简单地用IN
语句替换(尽管需要多个引用),但这使处理起来最容易.
This CTE is building the list of users so that it can be referenced as a table. This makes it easiest to deal with, although it would be possible to simply replaces it with an IN
statement everywhere (requires multiple references, though).
Threads(id) as (SELECT DISTINCT threadFk
FROM ThreadMembers as a
JOIN Selected_Users as b
ON b.id = a.userFk)
此CTE获取用户所涉及的(不同的)线程的列表.通常,这只是将列表分成对threadFk
的单个引用.
This CTE gets the list of (distinct) threads that the users are involved in. Mostly, this is just to chop the listing down to single references to threadFk
.
SELECT a.id
FROM Threads as a
...获取选定的线程集...
... Get the selected set of threads ...
WHERE NOT EXISTS (SELECT '1'
FROM ThreadMembers as b
LEFT JOIN Selected_Users as c
ON c.id = b.userFk
WHERE c.id IS NULL
AND b.threadFk = a.id)
在所选用户列表中没有任何人失踪"的地方-也就是说,它消除了具有较大用户列表子集的用户列表的线程.它还消除了从选择中列出了一些用户的线程,但也删除了一些没有的线程,这意味着用户的 counts 会匹配,但实际用户不会匹配(这是我的第一个版本失败了.
Where there isn't anybody 'missing' from the selected list of users - that is, it eliminates threads with user-lists that are subsets of a larger one. It also eliminates threads that have some of the users listed from the selection, but also a few that aren't, meaning that the counts of the users would match, but the actual users would not (this is where my first version failed).
我意识到,尽管现有语句解决了提供的用户列表是为给定线程列出的用户子集的情况,但我没有解决选择的用户列表包含一个用户列表的情况.子集,它是给定线程的用户列表.
I realized that, while the existing statement takes care of the situation where the provided list of users is a subset of users listed for a given thread, I didn't take care of the situation where the list of selected users contains a subset that is the list of users for the given thread.
AND NOT EXISTS (SELECT '1'
FROM Selected_Users as b
LEFT JOIN ThreadMembers as c
ON c.userFk = b.id
AND c.threadFk = a.id
WHERE c.userFk IS NULL)
此子句可解决此问题.在排除特定线程的用户之后,请确保选择列表中没有剩余的用户.
This clause fixes that. It makes sure that there aren't any leftover users in the selection list, after excluding users for a particular thread.
现在该声明使我有些烦恼-我可能会有一种更好的方法...
The statement is now bugging me a bit - there may be a slightly better way for me to do this...
Muwahaha,有一个 是一个COUNT(*)
版本,该版本也应该更快:
Muwahaha, there is a COUNT(*)
version, which should also be faster:
WITH Selected_Users(id) as (VALUES (@id1), (@id2), --etc--),
SELECT a.threadFk
FROM ThreadMembers as a
JOIN Selected_Users as b
ON b.id = a.userFk
GROUP BY a.threadFk
HAVING COUNT(*) = (SELECT COUNT(*) FROM Selected_Users)
AND COUNT(*) = (SELECT COUNT(*) from ThreadMembers as c
WHERE c.threadFk = a.threadFk)
说明:
SELECT a.threadFk
FROM ThreadMembers as a
JOIN Selected_Users as b
ON b.id = a.userFk
这是为了使列出的成员所属的所有线程都加入.这是与上面的Threads
CTE等效的内部.实际上,您也可以在上面的查询中删除该CTE.
This is joining to get all threads the listed members are a part of. This is the inside equivalent to the Threads
CTE above. Actually, you could remove that CTE in the above query, too.
GROUP BY a.threadFk
我们毕竟只想要一个给定线程的实例.另外(至少在DB2中),该语句的其余部分除非存在,否则是无效的.
We only want one instance of a given thread after all. Also (in DB2 at least), the rest of the statement isn't valid unless it's present.
HAVING COUNT(*) = (SELECT COUNT(*) FROM Selected_Users)
验证对于给定线程,是否存在所有选定用户.或者,所有选定用户都必须存在于给定线程中.
Verify that, for the given thread, all of the selected users are present. Or, all of the selected users must be present in the given thread.
AND COUNT(*) = (SELECT COUNT(*) from ThreadMembers as c
WHERE c.threadFk = a.threadFk)
验证对于给定线程,没有未选择的用户.否则,不得有任何用户被拒之门外"
Verify that, for the given thread, there are no non-selected users. Or, there must not be any users 'left out'
您应该为此获得仅索引访问权限(我似乎是).结果行的COUNT(*)
(对于GROUP BY
)仅应执行一次,然后重新使用. HAVING
子句在之后进行评估(如果我没记错的话),因此从原始表中进行计数的子选择应该只在一次进行threadFk中的em>.
You should get index-only access for this (I seem to be). The COUNT(*)
of the result rows (for the GROUP BY
) should only be performed once, and reused. The HAVING
clause is evaluated after the GROUP BY
takes place (if I recall correctly), so the sub-select for the count from the original table should only take place once per threadFk
.
这篇关于SQL-消息架构-需要在给定一组用户的情况下查找现有的消息线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!