本文介绍了MongoDB/NoSQL:保留文档更改历史记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据库应用程序中一个相当普遍的要求是跟踪对数据库中一个或多个特定实体的更改.我听说过这称为行版本控制,日志表或历史记录表(我敢肯定还有其他名称).在RDBMS中有多种方法可以处理它-您可以将所有源表中的所有更改都写入单个表(更多日志),或者为每个源表都有单独的历史表.您还可以选择管理应用程序代码的登录或通过数据库触发器进行管理.

A fairly common requirement in database applications is to track changes to one or more specific entities in a database. I've heard this called row versioning, a log table or a history table (I'm sure there are other names for it). There are a number of ways to approach it in an RDBMS--you can write all changes from all source tables to a single table (more of a log) or have a separate history table for each source table. You also have the option to either manage the logging in application code or via database triggers.

我正在尝试思考在NoSQL/文档数据库(特别是MongoDB)中针对同一问题的解决方案是什么样的,以及如何以统一的方式解决它.它会像创建文档的版本号一样简单,并且永远不会覆盖它们吗?为真实"文档还是已记录"文档创建单独的集合?这将如何影响查询和性能?

I'm trying to think through what a solution to the same problem would look like in a NoSQL/document database (specifically MongoDB), and how it would be solved in a uniform way. Would it be as simple as creating version numbers for documents, and never overwriting them? Creating separate collections for "real" vs. "logged" documents? How would this affect querying and performance?

无论如何,这是NoSQL数据库的常见情况吗?如果是,是否有常见的解决方案?

Anyway, is this a common scenario with NoSQL databases, and if so, is there a common solution?

推荐答案

很好的问题,我自己也在研究这个问题.

Good question, I was looking into this myself as well.

我遇到了Mongoid驱动程序Ruby的版本控制模块.我没有亲自使用过它,但是从我能找到的中,在每个文档中添加一个版本号.旧版本嵌入在文档本身中.主要缺点是整个文档在每次更改时都是重复的,这将导致在处理大型文档时会存储很多重复的内容.这种方法很好,但是当您处理小型文档和/或不经常更新文档时.

I came across the Versioning module of the Mongoid driver for Ruby. I haven't used it myself, but from what I could find, it adds a version number to each document. Older versions are embedded in the document itself. The major drawback is that the entire document is duplicated on each change, which will result in a lot of duplicate content being stored when you're dealing with large documents. This approach is fine though when you're dealing with small-sized documents and/or don't update documents very often.

另一种方法是仅将更改的字段存储在新版本中.然后,您可以拉平"历史记录以重建文档的任何版本.但是,这相当复杂,因为您需要跟踪模型中的更改并以应用程序可以重建最新文档的方式存储更新和删除.这可能很棘手,因为您要处理结构化文档而不是平面SQL表.

Another approach would be to store only the changed fields in a new version. Then you can 'flatten' your history to reconstruct any version of the document. This is rather complex though, as you need to track changes in your model and store updates and deletes in a way that your application can reconstruct the up-to-date document. This might be tricky, as you're dealing with structured documents rather than flat SQL tables.

每个字段也可以有一个单独的历史记录.用这种方法将文档重建为给定的版本要容易得多.在您的应用程序中,您不必显式跟踪更改,而只需在更改属性值时创建该属性的新版本.文档可能看起来像这样:

Each field can also have an individual history. Reconstructing documents to a given version is much easier this way. In your application you don't have to explicitly track changes, but just create a new version of the property when you change its value. A document could look something like this:

{
  _id: "4c6b9456f61f000000007ba6"
  title: [
    { version: 1, value: "Hello world" },
    { version: 6, value: "Foo" }
  ],
  body: [
    { version: 1, value: "Is this thing on?" },
    { version: 2, value: "What should I write?" },
    { version: 6, value: "This is the new body" }
  ],
  tags: [
    { version: 1, value: [ "test", "trivial" ] },
    { version: 6, value: [ "foo", "test" ] }
  ],
  comments: [
    {
      author: "joe", // Unversioned field
      body: [
        { version: 3, value: "Something cool" }
      ]
    },
    {
      author: "xxx",
      body: [
        { version: 4, value: "Spam" },
        { version: 5, deleted: true }
      ]
    },
    {
      author: "jim",
      body: [
        { version: 7, value: "Not bad" },
        { version: 8, value: "Not bad at all" }
      ]
    }
  ]
}

在版本中将文档的一部分标记为已删除仍然有些尴尬.您可以为要从应用程序中删除/还原的零件引入一个state字段:

Marking part of the document as deleted in a version is still somewhat awkward though. You could introduce a state field for parts that can be deleted/restored from your application:

{
  author: "xxx",
  body: [
    { version: 4, value: "Spam" }
  ],
  state: [
    { version: 4, deleted: false },
    { version: 5, deleted: true }
  ]
}

使用这些方法中的每一种,您都可以将一个最新且扁平化的版本存储在一个集合中,并将历史数据存储在一个单独的集合中.如果您只对文档的最新版本感兴趣,这将缩短查询时间.但是,当您同时需要最新版本和历史数据时,则需要执行两个查询,而不是一个.因此,选择使用单个集合还是两个单独的集合应该取决于您的应用需要历史版本的频率.

With each of these approaches you can store an up-to-date and flattened version in one collection and the history data in a separate collection. This should improve query times if you're only interested in the latest version of a document. But when you need both the latest version and historical data, you'll need to perform two queries, rather than one. So the choice of using a single collection vs. two separate collections should depend on how often your application needs the historical versions.

大部分答案只是我思想的转储,我实际上还没有尝试过.回顾它,第一种选择可能是最简单,最好的解决方案,除非重复数据的开销对您的应用程序而言非常重要.第二种选择非常复杂,可能不值得付出努力.第三个选项基本上是对第二个选项的优化,应该更容易实现,但是除非您真的不能选择第一个,否则可能不值得实现.

Most of this answer is just a brain dump of my thoughts, I haven't actually tried any of this yet. Looking back on it, the first option is probably the easiest and best solution, unless the overhead of duplicate data is very significant for your application. The second option is quite complex and probably isn't worth the effort. The third option is basically an optimization of option two and should be easier to implement, but probably isn't worth the implementation effort unless you really can't go with option one.

期待对此的反馈以及其他人对问题的解决方案:)

Looking forward to feedback on this, and other people's solutions to the problem :)

这篇关于MongoDB/NoSQL:保留文档更改历史记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-04 20:16