问题描述
我最近一直在研究 NoSql 选项.我的场景如下:
I have recently been researching NoSql options. My scenario is as follows:
我们从世界各地偏远地区的定制硬件收集和存储数据.我们每 15 分钟记录一次来自每个站点的数据.我们最终希望每 1 分钟移动一次.每条记录有 20 到 200 个测量值.一旦设置硬件记录并每次报告相同的测量值.
We collect and store data from custom hardware at remote locations around the world. We record data from every site every 15 minutes. We would eventually like to move to every 1 minute. Each record has between 20 and 200 measurements. Once set up the hardware records and reports the same measurements every time.
我们面临的最大问题是我们从每个项目中获得不同的测量值.我们测量了大约 50-100 种不同的测量类型,但是任何项目都可以有任意数量的每种测量类型.没有可以容纳数据的预设列集.因此,我们在系统上设置和配置项目时创建并构建每个项目数据表,其中包含所需的确切列.
The biggest issue we are facing is that we get a different set of measurements from every project. We measure about 50-100 different measurement types, however any project can have any number of each type of measurement. There is no preset set of columns that can accommodate the data. Because of this we create and build each projects data table with the exact columns it needs as we set up and configure the project on the system.
我们提供工具来帮助分析数据.这通常包括更多的计算和数据聚合,其中一些我们还存储.
We provide tools to help analyze the data. This typically includes more calculations and data aggregation, some of which we also store.
我们目前正在使用 mysql 数据库,每个客户端都有一个表.表之间没有关系.
We are currently using a mysql database with a table for each client. There are no relations between tables.
NoSql 看起来很有希望,因为我们可以存储一个 project_id、时间戳,然后其余的就不会被预设.这意味着一张表,数据中的更多关系,但仍处理各种测量.
NoSql seems promising because we could store a project_id, timestamp then the rest would not be preset. This means one table, more relationships in the data, yet still handling the variety of measurements.
NoSql"解决方案是否适合这项工作?如果有,有哪些?
Is a 'NoSql' solution right for this job? If so which ones?
我一直在研究 MongoDB,它看起来很有希望......
I have been investigation MongoDB and it seems promising...
说明示例:
项目 1 记录了 5 个数据点,mysql 表列如下所示:时间戳、温度、风速、降水、辐照度、风向
Project 1 has 5 data points recorded, the mysql table columns look like:timestamp, temp, wind speed, precipitation, irradiance, wind direction
项目2有3个数据点记录mysql表列:时间戳、温度、辐照度、温度2
Project 2 has 3 data points recorded mysql table columns:timestamp, temp, irradiance, temp2
推荐答案
简单的答案是这类问题没有简单的答案,找出适合您的场景的唯一方法是投资研发时间进去.
The simple answer is that there is no simple answer to these sort of problems, the only way to find out what works for your scenario is to invest R&D time into it.
这个问题很难回答,因为 OP 没有说明性能要求.它似乎是 7500 万/年的记录,写入速率为 num_customers*1 分钟(很低),但我没有所需的读取/查询性能的数据.
The question is hard to answer because the performance requirements aren't spelled out by the OP. It appears to be 75M/year records over a number of customers with a write rate of num_customers*1minute (which is low), but I don't have figures for the required read / query performance.
实际上,您已经有一个使用 分片 数据库://en.wikipedia.org/wiki/Partition_(database)" rel="nofollow noreferrer">水平分区 因为您将每个客户存储在单独的表中.这很好,并且会提高性能.但是,您尚未确定存在性能问题,因此需要对其进行测量并评估问题的规模,然后才能解决问题.
Effectively you have already a sharded database using horizontal partitioning because you're storing each customer in a seperate table. This is good and will increase performance. However you haven't yet established that you have a performance problem, so this needs to be measured and the problem size assessed before you can fix it.
NoSQL 数据库确实是解决传统 RDBMS 性能问题的好方法,但它不会提供自动可扩展性,也不是通用解决方案.您需要找到您的性能问题修复,然后设计(nosql)数据模型来提供解决方案.
A NoSQL database is indeed a good way of fixing performance problems with traditional RDBMS, but it will not provide automatic scalabity and is not a general solution. You need to find your performance problem fix and then design the (nosqL) data model to provide the solution.
根据您要实现的目标,我会查看 MongoDB,Apache Cassandra, ApacheHBase 或 Hibari.
Depending on what you're trying to achieve I'd look at MongoDB, Apache Cassandra, Apache HBase or Hibari.
请记住,NoSQL 是一个模糊的术语,通常包含
Remember that NoSQL is a vague term typically encompassing
- 读取或写入性能密集型应用程序.通常以牺牲另一个为代价来牺牲读取或写入性能.
- 分布和可扩展性
- 不同的持久性方法(RAM/磁盘)
- 一种更加结构化/定义的访问模式,使临时查询更加困难.
因此,首先我会看看传统 RDBMS 是否可以使用所有可用技术实现所需的性能,获取 高性能 MySQL 并阅读 MySQL 性能博客.
So, in the first instance I'd see if a traditional RDBMS can achieve the required performance, using all available techniques, get a copy of High Performance MySQL and read MySQL Performance Blog.
根据您的评论,我认为可以公平地说,您可以使用上述 NOSQL 引擎之一实现您想要的.
In light of your comments I think it is fair to say that you could achieve what you want with one of the above NOSQL engines.
我的主要建议是设计和实施您的数据模型,但您目前使用的并不正确.
My primary recommendation would be to get your data model designed and implemented, what you're using at the moment isn't really right.
所以看看我认为的 Entity-attribute-value 模型完全适合您的需要.
So look at Entity-attribute-value model as I think it is exactly right for what you need.
在考虑使用哪种技术之前,您需要正确设置数据模型,说实话动态修改架构不是数据模型.
You need to get your data model right before you can consider which technology to use, being honest modifying schemas dynamically isn't a datamodel.
我会使用传统的 SQL 数据库来验证和测试新的数据模型,因为管理工具更好,并且在您优化数据模型时通常更容易使用模式.
I'd use a traditional SQL database to validate and test the new datamodel as the management tools are better and it's generally easier to work with the schemas as you refine the datamodel.
这篇关于需要建议:这是“NoSQL"数据库的好用例吗?如果有,是哪一个?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!