本文介绍了什么NoSQL DB用于稀疏时间序列数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在计划一个侧面项目,我将处理时间序列类似的数据,并希望给一个闪亮的新的NoSQL DBs一个尝试,并寻找一个建议。

I'm planning a side project where I will be dealing with Time Series like data and would like to give one of those shiny new NoSQL DBs a try and am looking for a recommendation.

对于一个的符号我会有一个列表( time code> value )元组(随时间增加)。
不是所有的符号都会更新;某些符号可能会更新,而其他符号可能不会更新,并且可能会添加全新的符号

For a (growing) set of symbols I will have a list of (time,value) tuples (increasing over time).Not all symbols will be updated; some symbols may be updated while others may not, and completely new symbols may be added.

因此,数据库应允许:


  • 使用初始单元素(元组)列表添加符号。例如。 A:[(2012-04-14 10:23,50)]

  • 使用新的元组更新符号。 (将该元组附加到该符号的列表)。

  • 读取给定符号的数据。 (理想情况下,甚至让我指定应该返回数据的时间范围)

创建和更新操作应该是原子的。如果一次读多个符号是可能的,这将是有趣的。

The create and update operations should possibly be atomic. If reading multiple symbols at once is possible, that would be interesting.

性能并不重要。更新/创建大约每几个小时发生一次。

Performance is not critical. Updates/Creates will happen roughly once every few hours.

推荐答案

我相信所有的主要NoSQL数据库都会支持这个要求,特别是如果你实际上没有大量数据(这就是问题,为什么NoSQL?)。

I believe literally all the major NoSQL databases will support that requirement, especially if you don't actually have a large volume of data (which begs the question, why NoSQL?).

这就是说,我不得不最近设计和使用NoSQL数据库的时间序列数据,

That said, I've had to recently design and work with a NoSQL database for time series data so can give some input on that design, which can then be extrapolated for all others.

我们选择的数据库是 Cassandra ,并且我们的设计如下:

Our chosen database was Cassandra, and our design was as follows:


  • 所有符号的单个键空格


  • 每个条目都是该相关行的新列

  • 每个值(可以超过单个值)部分时间输入

  • A single keyspace for all 'symbols'
  • Each symbol was a new row
  • Each time entry was a new column for that relevant row
  • Each value (can be more than a single value) was the value part of the time entry

这样可以实现您所要求的一切,最明显的是读取单个符号的数据,必要的范围(列范围调用)。虽然你说性能不是关键,这是对我们和这是相当高性能 - 任何单个符号的所有数据是通过定义排序(列名称排序),并始终存储在同一个节点(没有交叉节点通信的简单查询)。最后,此设计很好地转换为其他具有动态列的NoSQL数据库。

This lets you achieve everything you asked for, most notably to read the data for a single symbol, and using a range if necessary (column range calls). Although you said performance wasn't critical, it was for us and this was quite performant also - all data for any single symbol is by definition sorted (column name sort) and always stored on the same node (no cross node communication for simple queries). Finally, this design translates well to other NoSQL databases that have have dynamic columns.

进一步,这里有一些关于使用MongoDB时间系列商店:

Further to this, here's some information on using MongoDB (and capped collections if necessary) for a time series store: MongoDB as a Time Series Database

最后,这里讨论SQL与NoSQL的时间序列:

Finally, here's a discussion of SQL vs NoSQL for time series: http://dba.stackexchange.com/questions/7634/timeseries-sql-or-nosql

我可以在讨论中添加以下内容:

I can add to that discussion the following:


  • NoSQL的学习曲线将更高,您不能在软成本方面免费获得更多的灵活性和功能。谁将在操作上支持此数据库?

  • 如果您期望此功能在未来有所增长(作为要添加到每个时间条目的更多字段,或者更大的容量符号或符号的时间序列大小),那么肯定会和NoSQL一起去。灵活性的好处是巨大的,在每符号和符号数的基础上获得的可扩展性(与上述设计)几乎是无限的(我说几乎无界 - 每行的最大列数在十亿,最大我相信每个键空间的行数是无界的)。

这篇关于什么NoSQL DB用于稀疏时间序列数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-10 00:14