问题描述
任务:更新许多边属性的最快方法.出于性能原因,我忽略了图形方法并直接使用集合进行过滤.
Task: Fastest way to update many edges attributes. For performance reasons, I am ignore graph methods and work with collection directly for filtering.
ArangoDB 2.8b3
ArangoDB 2.8b3
查询【Offer-边缘集合】:
Query [Offer - edge collection]:
FOR O In Offer
FILTER O._from == @from and O._to == @to and O.expired > DATE_TIMESTAMP(@newoffertime)
UPDATE O WITH { expired: @newoffertime } IN Offer
RETURN { _key: OLD._key, prices_hash: OLD.prices_hash }
我在 _to、_from 上有系统索引,在过期时有范围索引
I have system index on _to, _from and range index on expired
查询说明展示
7 edge Offer false false 49.51 % [ `_from`, `_to` ] O.`_to` == "Product/1023058135528"
系统索引仅用于过滤部分记录(_to),不适用于两者(_from, _to),也未使用过期"索引.请解释这种行为的原因,如果我在规划数据模型时确定知道,是否可以指定用于最短路径的索引提示?
System index used for filtering only part of records (_to), not for both (_from, _to), 'expired' index also not used. Please explain me the reasons for this behavior, and there is a possibility to specify hint of indices to be used for the shortest path, if I know for sure when planning data model?
推荐答案
对于在查询中结合逻辑 AND 的过滤条件,ArangoDB 的查询优化器将选择单个索引.这就是它没有同时选择边缘索引和跳过列表索引的原因.
For filter conditions combined with logical ANDs as in your query, ArangoDB's query optimizer will pick a single index. This is the reason why it hasn't picked the edge index and the skiplist index at the same time.
它将在 expired
上的skiplist 索引和 [ "_from", "_to" ]
上的边索引之间进行选择,并从中选择一个它决定了较低的成本,这是通过指数选择性估计来衡量的.如解释输出所示,它似乎选择了 _to
上的边缘索引.
It will do a selection between the skiplist index on expired
and the edge index on [ "_from", "_to" ]
, and will pick the one for which it determines the lower cost, which is measured by index selectivity estimates. As the explain output shows, it seems to have picked the edge index on _to
.
边缘索引内部由两个单独的哈希索引组成,一个位于 _from
属性上,另一个位于 _to
属性上,因此它允许通过 >_from
和 _to
属性.但是,它不是[ "_from", "_to" ]
上的组合索引,因此它不支持要求 _from
和_to
同时.它必须选择一个内部哈希索引,并且似乎在该查询中选择了 _to
上的一个.该决定再次基于平均索引选择性.
The edge index internally consists of two separate hash indexes, one on the _from
attribute and one on the _to
attribute, so it allows quick access via both the _from
and the _to
attributes. However, it's not a combined index on [ "_from", "_to" ]
, so it does not support queries that ask for _from
and _to
at the same time. It has to pick one of the internal hash indexes, and seems to have picked the one on _to
in that query. The decision is based on average index selectivity again.
没有办法向优化器提供任何索引使用提示——除此之外,对于这个特定查询,它不能同时使用两个索引.
There is no way to provide any index usage hint to the optimizer - apart from that, it wouldn't be able to use two indexes at the same time for this particular query.
查看解释输出中的选择性估计,似乎边缘索引不是很有选择性,这意味着会有很多具有相同 _to
值的边缘.由于优化器还应该考虑到 _from
上的索引,我会假设索引的选择性更低,并且这些索引中的每一个都只会帮助跳过最多 50% 的边缘,这不是很多.如果确实如此,那么查询仍将检索(和后过滤)大量文档,这解释了潜在的缓慢.
Looking at the selectivity estimate in the explain output, it seems that the edge index is not very selective, meaning there'll be lots of edges with the same _to
values. As the optimizer should have also taken into account the index on _from
, I would assume that index is even less selective, and that each of these indexes will only help to skip at most 50 % of the edges, which is not very much. If that's actually the case, then the query will still retrieve (and post-filter) a lot of documents, explaining potential slowness.
目前,属性 _from
和 _to
会自动在边集合的始终存在的边索引中编入索引,并且它们不能用于其他用户定义的索引中.这是我们希望在未来版本中添加的一项功能,因为用户定义的索引可以访问 _from
和 _to
,因此可以创建一个组合(排序)[ "_from", "_to", "expired" ]
上的索引,这可能比孤立的三个单属性索引中的任何一个更具选择性.
At the moment the attributes _from
and _to
are automatically indexed in an edge collection's always-present edge index, and they cannot be used in additional, user-defined indexes.This is a feature that we would like to add in a future release, because with _from
and _to
being accessible for user-defined indexes, one could create a combined (sorted) index on [ "_from", "_to", "expired" ]
which would be potentially much more selective than any of the three single-attribute indexes in isolation.
这篇关于ArangoDB 索引与边缘集合的使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!