问题描述
假设我有实体 entry
和 ref-to-many 属性 :entry/groups
.我应该如何构建查询以查找其 :entry/groups
属性包含所有我输入的外部 ID 的实体?
Suppose I have entity entry
with ref-to-many attribute :entry/groups
. How should I build a query to find entities whose :entry/groups
attribute contains all of my input foreign ids?
下一个伪代码将更好地说明我的问题:
Next pseudocode will illustrate my question better:
[2 3] ; having this as input foreign ids
;; and having these entry entities in db
[{:entry/id "A" :entry/groups [2 3 4]}
{:entry/id "B" :entry/groups [2]}
{:entry/id "C" :entry/groups [2 3]}
{:entry/id "D" :entry/groups [1 2 3]}
{:entry/id "E" :entry/groups [2 4]}]
;; only A, C, D should be pulled
作为 Datomic/Datalog 的新手,我用尽了所有选项,因此感谢您的帮助.谢谢!
Being new in Datomic/Datalog, I exhausted all options, so any help is appreciated. Thanks!
推荐答案
TL;DR
您正在解决 Datomic 数据日志中动态连接"的一般问题.
TL;DR
You're tackling the general problem of 'dynamic conjunction' in Datomic's Datalog.
此处有 3 个策略:
- 编写使用 2 个否定和 1 个析取或递归规则的动态 Datalog 查询(见下文)
- 生成查询代码(相当于 Alan Thompson 的回答):缺点是动态生成 Datalog 子句的常见缺点,即您不会从 查询计划缓存.
- 直接使用索引(EAVT 或 AVET).
- Write a dynamic Datalog query which uses 2 negations and 1 disjunction or a recursive rule (see below)
- Generate the query code (equivalent to Alan Thompson's answer): the drawbacks are the usual drawbacks of generating Datalog clauses dynamically, i.e you don't benefit from query plan caching.
- Use the indexes directly (EAVT or AVET).
动态数据日志查询
Datalog 没有直接的方式来表达动态连接(逻辑 AND/'for all ...'/集合交集).但是,您可以通过组合一个析取(逻辑 OR/'exists ...'/set union)和两个否定,即 (对于 ?Gs p(?e,?g 中的所有 ?g),在纯 Datalog 中实现它)) NOT(Exists ?g in ?Gs, 这样 NOT(p(?e, ?g)))
在您的情况下,这可以表示为:
In your case, this could be expressed as:
[:find [?entry ...] :in $ ?groups :where
;; these 2 clauses are for restricting the set of considered datoms, which is more efficient (and necessary in Datomic's Datalog, which will refuse to scan the whole db)
;; NOTE: this imposes ?groups cannot be empty!
[(first ?groups) ?group0]
[?entry :entry/groups ?group0]
;; here comes the double negation
(not-join [?entry ?groups]
[(identity ?groups) [?group ...]]
(not-join [?entry ?group]
[?entry :entry/groups ?group]))]
好消息:这可以表示为一个非常通用的 Datalog 规则(我最终可能会将其添加到 Datofu):
Good news: this can be expressed as a very general Datalog rule (which I may end up adding to Datofu):
[(matches-all ?e ?a ?vs)
[(first ?vs) ?v0]
[?e ?a ?v0]
(not-join [?e ?a ?vs]
[(seq ?vs) [?v ...]]
(not-join [?e ?a ?v]
[?e ?a ?v]))]
...这意味着您的查询现在可以表示为:
... which means your query can now be expressed as:
[:find [?entry ...] :in % $ ?groups :where
(matches-all ?entry :entry/groups ?groups)]
注意:有一个使用递归规则的替代实现:
NOTE: there's an alternate implementation using a recursive rule:
[[(matches-all ?e ?a ?vs)
[(seq ?vs)]
[(first ?vs) ?v]
[?e ?a ?v]
[(rest ?vs) ?vs2]
(matches-all ?e ?a ?vs2)]
[(matches-all ?e ?a ?vs)
[(empty? ?vs)]]]
这个的优点是可以接受一个空的 ?vs
集合(只要 ?e
和 ?a
在一些查询中的其他方式).
This one has the advantage of accepting an empty ?vs
collection (so long as ?e
and ?a
have been bound in some other way in the query).
生成查询代码的优点是在这种情况下它相对简单,并且它可能比更动态的替代方法更有效地执行查询.在 Datomic 中生成 Datalog 查询的缺点是您可能会失去查询计划缓存的好处;因此,即使您要生成查询,您仍然希望使它们尽可能通用(即仅取决于 v
值的数量)
The advantage of generating the query code is that it's relatively simple in this case, and it can probably make the query execution more efficient than the more dynamic alternative. The drawback of generating Datalog queries in Datomic is that you may lose the benefits of query plan caching; therefore, even if you're going to generate queries, you still want to make them as generic as possible (i.e depending only on the number of v
values)
(defn q-find-having-all-vs
[n-vs]
(let [v-syms (for [i (range n-vs)]
(symbol (str "?v" i)))]
{:find '[[?e ...]]
:in (into '[$ ?a] v-syms)
:where
(for [?v v-syms]
['?e '?a ?v])}))
;; examples
(q-find-having-all-vs 1)
=> {:find [[?e ...]],
:in [$ ?a ?v0],
:where
([?e ?a ?v0])}
(q-find-having-all-vs 2)
=> {:find [[?e ...]],
:in [$ ?a ?v0 ?v1],
:where
([?e ?a ?v0]
[?e ?a ?v1])}
(q-find-having-all-vs 3)
=> {:find [[?e ...]],
:in [$ ?a ?v0 ?v1 ?v2],
:where
([?e ?a ?v0]
[?e ?a ?v1]
[?e ?a ?v2])}
;; executing the query: note that we're passing the attribute and values!
(apply d/q (q-find-having-all-vs (count groups))
db :entry/group groups)
直接使用索引
我完全不确定上述方法在当前 Datomic Datalog 实现中的效率如何.如果您的基准测试显示这很慢,您可以随时回退到直接索引访问.
Use the indexes directly
I'm not sure at all how efficient the above approaches are in the current implementation of Datomic Datalog. If your benchmarking shows this is slow, you can always fall back to direct index access.
以下是 Clojure 中使用 AVET 索引的示例:
Here's an example in Clojure using the AVET index:
(defn find-having-all-vs
"Given a database value `db`, an attribute identifier `a` and a non-empty seq of entity identifiers `vs`,
returns a set of entity identifiers for entities which have all the values in `vs` via `a`"
[db a vs]
;; DISCLAIMER: a LOT can be done to improve the efficiency of this code!
(apply clojure.set/intersection
(for [v vs]
(into #{}
(map :e)
(d/datoms db :avet a v)))))
这篇关于查找 ref-to-many 属性包含输入的所有元素的实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!