假设我有实体 entry
,且具有ref-to-many属性:entry / groups
。我应该如何构建查询以查找其:entry / groups
Suppose I have entity entry
with ref-to-many attribute :entry/groups
. How should I build a query to find entities whose :entry/groups
attribute contains all of my input foreign ids?
Next pseudocode will illustrate my question better:
[2 3] ; having this as input foreign ids
;; and having these entry entities in db
[{:entry/id "A" :entry/groups [2 3 4]}
{:entry/id "B" :entry/groups [2]}
{:entry/id "C" :entry/groups [2 3]}
{:entry/id "D" :entry/groups [1 2 3]}
{:entry/id "E" :entry/groups [2 4]}]
;; only A, C, D should be pulled
作为Datomic / Datalog的新手,我用尽了所有选择,因此,感谢您的帮助。谢谢!
Being new in Datomic/Datalog, I exhausted all options, so any help is appreciated. Thanks!
You're tackling the general problem of 'dynamic conjunction' in Datomic's Datalog.
- 编写一个动态的Datalog查询,该查询使用2个求反和1个析取或递归规则(请参见下文)
- 生成查询代码(相当于Alan Thompson的答案):缺点是通常的动态生成Datalog子句的缺点,即您不能从。
- 直接使用( EAVT或AVET)。
- Write a dynamic Datalog query which uses 2 negations and 1 disjunction or a recursive rule (see below)
- Generate the query code (equivalent to Alan Thompson's answer): the drawbacks are the usual drawbacks of generating Datalog clauses dynamically, i.e you don't benefit from query plan caching.
- Use the indexes directly (EAVT or AVET).
数据记录没有直接动态连词的表达方式(逻辑AND /'for all ...'/设置交集)。但是,您可以通过将一个析取(逻辑或/存在... /集合并集)和两个否定结合起来,在纯Datalog中实现它,即(对于?Gs p(?e ,?g))< => NOT(在?G中存在?g,例如NOT(p(?e,?g)))
In your case, this could be expressed as:
[:find [?entry ...] :in $ ?groups :where
;; these 2 clauses are for restricting the set of considered datoms, which is more efficient (and necessary in Datomic's Datalog, which will refuse to scan the whole db)
;; NOTE: this imposes ?groups cannot be empty!
[(first ?groups) ?group0]
[?entry :entry/groups ?group0]
;; here comes the double negation
(not-join [?entry ?groups]
[(identity ?groups) [?group ...]]
(not-join [?entry ?group]
[?entry :entry/groups ?group]))]
Good news: this can be expressed as a very general Datalog rule (which I may end up adding to Datofu):
[(matches-all ?e ?a ?vs)
[(first ?vs) ?v0]
[?e ?a ?v0]
(not-join [?e ?a ?vs]
[(seq ?vs) [?v ...]]
(not-join [?e ?a ?v]
[?e ?a ?v]))]
... which means your query can now be expressed as:
[:find [?entry ...] :in % $ ?groups :where
(matches-all ?entry :entry/groups ?groups)]
注意:还有一种使用 递归规则 的实现:
NOTE: there's an alternate implementation using a recursive rule:
[[(matches-all ?e ?a ?vs)
[(seq ?vs)]
[(first ?vs) ?v]
[?e ?a ?v]
[(rest ?vs) ?vs2]
(matches-all ?e ?a ?vs2)]
[(matches-all ?e ?a ?vs)
[(empty? ?vs)]]]
这个优点是可以接受空的 ?vs
This one has the advantage of accepting an empty ?vs
collection (so long as ?e
and ?a
have been bound in some other way in the query).
生成查询代码的优点是在这种情况下,它相对简单,并且可能比更动态的替代方法更有效地执行查询。在Datomic中生成Datalog查询的缺点是您可能会失去查询计划缓存的好处。因此,即使您要生成查询,您仍然希望使它们尽可能通用(即仅取决于 v
The advantage of generating the query code is that it's relatively simple in this case, and it can probably make the query execution more efficient than the more dynamic alternative. The drawback of generating Datalog queries in Datomic is that you may lose the benefits of query plan caching; therefore, even if you're going to generate queries, you still want to make them as generic as possible (i.e depending only on the number of v
(defn q-find-having-all-vs
(let [v-syms (for [i (range n-vs)]
(symbol (str "?v" i)))]
{:find '[[?e ...]]
:in (into '[$ ?a] v-syms)
(for [?v v-syms]
['?e '?a ?v])}))
;; examples
(q-find-having-all-vs 1)
=> {:find [[?e ...]],
:in [$ ?a ?v0],
([?e ?a ?v0])}
(q-find-having-all-vs 2)
=> {:find [[?e ...]],
:in [$ ?a ?v0 ?v1],
([?e ?a ?v0]
[?e ?a ?v1])}
(q-find-having-all-vs 3)
=> {:find [[?e ...]],
:in [$ ?a ?v0 ?v1 ?v2],
([?e ?a ?v0]
[?e ?a ?v1]
[?e ?a ?v2])}
;; executing the query: note that we're passing the attribute and values!
(apply d/q (q-find-having-all-vs (count groups))
db :entry/group groups)
我不确定上述方法在当前实现中的效率如何Datomic Datalog。如果基准测试显示这很慢,则可以随时退回到直接索引访问。
Use the indexes directly
I'm not sure at all how efficient the above approaches are in the current implementation of Datomic Datalog. If your benchmarking shows this is slow, you can always fall back to direct index access.
Here's an example in Clojure using the AVET index:
(defn find-having-all-vs
"Given a database value `db`, an attribute identifier `a` and a non-empty seq of entity identifiers `vs`,
returns a set of entity identifiers for entities which have all the values in `vs` via `a`"
[db a vs]
;; DISCLAIMER: a LOT can be done to improve the efficiency of this code!
(apply clojure.set/intersection
(for [v vs]
(into #{}
(map :e)
(d/datoms db :avet a v)))))