本文介绍了nosql 数据库中的树结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为 Google App Engine 开发一个应用程序,它使用 BigTable 作为其数据存储区.

这是一个关于协作编写故事的应用程序.这是一个非常简单的爱好项目,我只是为了好玩而工作.它是开源的,您可以在此处查看:

我怎样才能建立一个结构既能提高写作效率,又能提高阅读效率?

解决方案

有许多众所周知的方法来表示数据库中的树;他们每个人都有自己的优点和缺点.以下是最常见的:

  • 邻接列表,每个节点存储其父节点的 ID.
  • 物化路径,这是 Keyur 描述的策略.这也是 App Engine 中的实体组(例如父实体)使用的方法.这也或多或少是您在更新中描述的内容.
  • 嵌套集,其中每个节点都有左"和右"ID,这样所有子节点都包含在该范围内.
  • 使用根 ID 配置的邻接列表.

这些都有自己的优点和缺点.邻接表很简单,更新成本低,但需要多次查询来检索子树(每个父节点一个).增强的邻接表可以通过在每条记录中存储根节点的 ID 来检索整个树.

物化路径易于实现且更新成本低廉,并允许查询任意子树,但会增加深度树的开销.

嵌套集更难实现,每次插入时平均需要更新一半的节点.它们允许您查询任意子树,而不会增加具体化路径的密钥长度问题.

不过,在您的具体情况下,您似乎根本不需要树形结构:每个故事,尽管可能是原始故事,但都是独立的.我建议有一个故事"模型,其中包含其段落的键列表(例如,在 Python 中为 db.ListProperty(db.Key)).要渲染一个故事,您需要获取故事,然后对所有段落进行批量获取.要分支故事,只需复制故事条目 - 保持对段落的引用不变.

I'm developing an application for Google App Engine which uses BigTable for its datastore.

It's an application about writing a story collaboratively. It's a very simple hobby project that I'm working on just for fun. It's open source and you can see it here: http://story.multifarce.com/

The idea is that anyone can write a paragraph, which then needs to be validated by two other people. A story can also be branched at any paragraph, so that another version of the story can continue in another direction.

Imagine the following tree structure:

Every number would be a paragraph. I want to be able to select all the paragraphs in every unique story line. Basically, those unique story lines are (2, 7, 2); (2, 7, 6, 5); (2, 7, 6, 11) and (2, 5, 9, 4). Ignore that the node "2" appears twice, I just took a tree structure diagram from Wikipedia.

I also made a diagram of a proposed solution: https://docs.google.com/drawings/edit?id=1fdUISIjGVBvIKMSCjtE4xFNZxiE08AoqvJSLQbxN6pc&hl=en

How can I set up a structure is performance efficient both for writing, but most importantly for reading?

解决方案

There are a number of well known ways to represent trees in databases; each of them have their pros and cons. Here are the most common:

  • Adjacency list, where each node stores the ID of its parent.
  • Materialized path, which is the strategy Keyur describes. This is also the approach used by entity groups (eg, parent entities) in App Engine. It's also more or less what you're describing in your update.
  • Nested sets, where each node has 'left' and 'right' IDs, such that all child nodes are contained in that range.
  • Adjacency lists agumented with a root ID.

Each of these has its own advantages and disadvantages. Adjacency lists are simple, and cheap to update, but require multiple queries to retrieve a subtree (one for each parent node). Augmented adjacency lists make it possible to retrieve an entire tree by storing the ID of the root node in every record.

Materialized paths are easy to implement and cheap to update, and permit querying arbitrary subtrees, but impose increasing overhead for deep trees.

Nested sets are tougher to implement, and require updating, on average, half the nodes each time you make an insertion. They allow you to query arbitrary subtrees, without the increasing key length issue materialized path has.

In your specific case, though, it seems like you don't actually need a tree structure at all: each story, branched off an original though it may be, stands alone. What I would suggest is having a 'Story' model, which contains a list of keys of its paragraphs (Eg, in Python a db.ListProperty(db.Key)). To render a story, you fetch the Story, then do a batch fetch for all the Paragraphs. To branch a story, simply duplicate the story entry - leaving the references to Paragraphs unchanged.

这篇关于nosql 数据库中的树结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-02 01:28