各类数据库的实例(真实案例)

• 大量写入性能.这可能是基于 Google 影响的规范用法.高音量.Facebook 需要存储每月 1350 亿条消息 .例如，Twitter 存在存储 7 TB/数据/数据的问题一天，这一要求有望每年翻倍.这是数据太大而无法容纳一个节点的问题.以 80 MB/s 的速度存储 7TB 需要一天的时间，因此写入需要分布在集群上，这意味着键值访问、MapReduce、复制、容错、一致性问题等等.为了更快的写入，可以使用内存系统.

• 快速键值访问.这可能是 NoSQL 在一般思维模式中被引用次数第二多的优点.当延迟很重要时，很难在键上散列并直接从内存中读取值，或者只需一次磁盘查找.并非每个 NoSQL 产品都与快速访问有关，例如，有些产品更注重可靠性.但人们长期以来一直想要的是更好的 memcached，许多 NoSQL 系统都提供了.

• 灵活的模式和灵活的数据类型. NoSQL 产品支持一系列新的数据类型，这是 NoSQL 的一个主要创新领域.我们有:面向列、图形、高级数据结构、面向文档和键值.无需大量映射即可轻松存储复杂对象.开发人员喜欢避免使用复杂的架构和 ORM 框架.缺乏结构允许更大的灵活性.我们还有对程序和程序员友好的兼容数据类型，例如 JSON.

• 架构迁移. 无架构使处理架构迁移变得更容易，而无需过多担心.架构在某种意义上是动态的，因为它们是由应用程序在运行时强加的，因此应用程序的不同部分可以有不同的架构视图.

• 写作可用性.无论如何，您的写作都需要成功吗?然后我们可以进入分区，CAP, 最终一致性和所有的爵士乐.

• 更易于维护、管理和操作.这是非常特定于产品的，但许多 NoSQL 供应商正试图通过让开发人员轻松采用它们来获得采用.他们在易用性、最少的管理和自动化操作上花费了大量精力.这可以降低运营成本，因为不必编写特殊代码来扩展从未打算以这种方式使用的系统.

• 没有单点故障. 并非每个产品都提供这一点，但我们看到了在相对容易配置和管理高可用性方面的明确融合，以及自动负载平衡和集群大小调整.完美的云合作伙伴.

• 普遍可用的并行计算.我们看到 MapReduce 已融入产品，这使得并行计算成为未来发展的常态.

• 程序员易于使用. 访问您的数据应该很容易.虽然关系模型对于最终用户(如会计师)来说是直观的，但对于开发人员来说却不是很直观.程序员了解键、值、JSON、Javascript 存储过程、HTTP 等等.NoSQL 适用于程序员.这是一场由开发商主导的政变.对数据库问题的响应不能总是聘请真正知识渊博的DBA，获取您的架构对，稍微去规范化等等，程序员更喜欢他们可以为自己工作的系统.让产品发挥作用应该不难.钱是问题的一部分.如果扩展一个产品的成本很高，那么你会不会选择更便宜、你可以控制、更容易使用、更容易扩展的产品?

• 为正确的问题使用正确的数据模型.不同的数据模型用于解决不同的问题.例如，已经付出了很多努力，将图操作嵌入到关系模型中，但它不起作用.在图数据库中解决图问题不是更好吗?我们现在看到了一种试图在问题和解决方案之间找到最佳匹配的一般策略.

• 避免碰壁. 许多项目在其项目中碰壁.他们已经用尽了所有选项来使他们的系统扩展或正常运行，并且想知道下一步是什么?选择一种产品和一种方法，可以通过使用增量添加的资源进行线性扩展来跳过墙壁，这是令人欣慰的.有一次这是不可能的.一切都需要定制，但这已经改变了.我们现在看到了项目可以轻松采用的可用的开箱即用产品.

• 分布式系统支持. 并不是每个人都担心非 NoSQL 系统所能达到的规模或性能.他们需要的是一个分布式系统，它可以跨越数据中心，同时处理故障场景而不会出现问题.NoSQL 系统，因为它们专注于规模，倾向于利用分区，倾向于不使用严格的一致性协议，因此非常适合在分布式场景中运行.

• 可调 CAP 权衡. NoSQL 系统通常是唯一带有滑块"的产品，用于选择它们想要在 CAP 范围内的位置.关系数据库选择强一致性，这意味着它们不能容忍分区故障.最后，这是一个商业决定，应该根据具体情况来决定.你的应用甚至关心一致性吗?几滴可以吗?您的应用需要强一致性还是弱一致性?可用性更重要还是一致性更重要?失败会比犯错更昂贵吗?很高兴拥有可以让您选择的产品.

• 更具体的用例

• 管理大量非事务性数据流:Apache 日志、应用程序日志、MySQL 日志、点击流、等等

• 同步在线和离线数据.这是 CouchDB 的目标.

• 在所有负载下的快速响应时间.

• 当复杂连接的查询负载对于 RDBMS 而言过大时，避免重连接.

• 低延迟至关重要的软实时系统.游戏就是一个例子.

• 需要支持各种不同的写入、读取、查询和一致性模式的应用程序.有些系统针对 50% 读取、50% 写入、95% 写入或 95% 读取进行了优化.只读应用程序需要极快的速度和弹性、简单的查询，并且可以容忍稍微陈旧的数据.需要中等性能、读/写访问、简单查询、完全权威数据的应用程序.具有复杂查询要求的只读应用程序.

• 负载平衡以适应数据和使用集中并帮助保持微处理器忙碌.

• 实时插入、更新和查询.

• 分层数据，如线程讨论和部件爆炸.

• 动态表创建.

• 两层应用程序，其中低延迟数据通过快速 NoSQL 接口提供，但数据本身可以由高延迟 Hadoop 应用程序或其他低优先级应用程序计算和更新.

• 顺序数据读取. 需要选择正确的底层数据存储模型.B 树可能不是顺序读取的最佳模型.

• 将可能需要更好性能/可扩展性的部分服务分割到自己的系统上.例如，用户登录可能需要高性能，并且此功能可以使用专用服务来实现这些目标.

• 缓存. 用于网站和其他应用程序的高性能缓存层.示例是大型强子对撞机使用的数据聚合系统的缓存.投票.

• 实时页面查看计数器.

• 用户注册、个人资料和会话数据.

• 文档、目录管理和内容管理系统. 将复杂文档存储为一个整体而不是组织为关系表的能力有助于实现这些.类似的逻辑适用于库存、购物车和其他结构化数据类型.

• 存档. 存储仍可在线访问的大量连续数据流.具有灵活架构的面向文档的数据库，可以处理架构随时间的变化.

• 分析.使用 MapReduce、Hive 或 Pig 执行支持高写入负载的分析查询和横向扩展系统.

• 使用异构类型的数据，例如，不同媒体类型的通用级别.

• 嵌入式系统.他们不想要 SQL 和服务器的开销，因此他们使用更简单的存储方式.

• 一个市场"游戏，您可以在其中拥有城镇中的建筑物.你想让某人的建筑列表快速弹出，所以你在建筑表的所有者列上进行分区，这样选择是单分区的.但是，当有人购买其他人的建筑物时，您会更新所有者列以及价格.

• JPL 正在使用 SimpleDB 存储流动站计划属性.S3 中的完整计划 blob 的引用.. Twitter, for example, has the problem of storing 7 TB/data per day with the prospect of this requirement doubling multiple times per year. This is the data is too big to fit on one node problem. At 80 MB/s it takes a day to store 7TB so writes need to be distributed over a cluster, which implies key-value access, MapReduce, replication, fault tolerance, consistency issues, and all the rest. For faster writes in-memory systems can be used.

• Fast key-value access. This is probably the second most cited virtue of NoSQL in the general mind set. When latency is important it's hard to beat hashing on a key and reading the value directly from memory or in as little as one disk seek. Not every NoSQL product is about fast access, some are more about reliability, for example. but what people have wanted for a long time was a better memcached and many NoSQL systems offer that.

• Flexible schema and flexible datatypes. NoSQL products support a whole range of new data types, and this is a major area of innovation in NoSQL. We have: column-oriented, graph, advanced data structures, document-oriented, and key-value. Complex objects can be easily stored without a lot of mapping. Developers love avoiding complex schemas and ORM frameworks. Lack of structure allows for much more flexibility. We also have program- and programmer-friendly compatible datatypes like JSON.

• Schema migration. Schemalessness makes it easier to deal with schema migrations without so much worrying. Schemas are in a sense dynamic because they are imposed by the application at run-time, so different parts of an application can have a different view of the schema.

• Write availability. Do your writes need to succeed no matter what? Then we can get into partitioning, CAP, eventual consistency and all that jazz.

• Easier maintainability, administration and operations. This is very product specific, but many NoSQL vendors are trying to gain adoption by making it easy for developers to adopt them. They are spending a lot of effort on ease of use, minimal administration, and automated operations. This can lead to lower operations costs as special code doesn't have to be written to scale a system that was never intended to be used that way.

• No single point of failure. Not every product is delivering on this, but we are seeing a definite convergence on relatively easy to configure and manage high availability with automatic load balancing and cluster sizing. A perfect cloud partner.

• Generally available parallel computing. We are seeing MapReduce baked into products, which makes parallel computing something that will be a normal part of development in the future.

• Programmer ease of use. Accessing your data should be easy. While the relational model is intuitive for end users, like accountants, it's not very intuitive for developers. Programmers grok keys, values, JSON, Javascript stored procedures, HTTP, and so on. NoSQL is for programmers. This is a developer-led coup. The response to a database problem can't always be to hire a really knowledgeable DBA, get your schema right, denormalize a little, etc., programmers would prefer a system that they can make work for themselves. It shouldn't be so hard to make a product perform. Money is part of the issue. If it costs a lot to scale a product then won't you go with the cheaper product, that you control, that's easier to use, and that's easier to scale?

• Use the right data model for the right problem. Different data models are used to solve different problems. Much effort has been put into, for example, wedging graph operations into a relational model, but it doesn't work. Isn't it better to solve a graph problem in a graph database? We are now seeing a general strategy of trying to find the best fit between a problem and solution.

• Avoid hitting the wall. Many projects hit some type of wall in their project. They've exhausted all options to make their system scale or perform properly and are wondering what next? It's comforting to select a product and an approach that can jump over the wall by linearly scaling using incrementally added resources. At one time this wasn't possible. It took custom built everything, but that's changed. We are now seeing usable out-of-the-box products that a project can readily adopt.

• Distributed systems support. Not everyone is worried about scale or performance over and above that which can be achieved by non-NoSQL systems. What they need is a distributed system that can span datacenters while handling failure scenarios without a hiccup. NoSQL systems, because they have focussed on scale, tend to exploit partitions, tend not use heavy strict consistency protocols, and so are well positioned to operate in distributed scenarios.

• Tunable CAP tradeoffs. NoSQL systems are generally the only products with a "slider" for choosing where they want to land on the CAP spectrum. Relational databases pick strong consistency which means they can't tolerate a partition failure. In the end, this is a business decision and should be decided on a case by case basis. Does your app even care about consistency? Are a few drops OK? Does your app need strong or weak consistency? Is availability more important or is consistency? Will being down be more costly than being wrong? It's nice to have products that give you a choice.

• More Specific Use Cases

• Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, clickstreams, etc.

• Syncing online and offline data. This is a niche CouchDB has targeted.

• Fast response times under all loads.

• Avoiding heavy joins for when the query load for complex joins become too large for an RDBMS.

• Soft real-time systems where low latency is critical. Games are one example.

• Applications where a wide variety of different write, read, query, and consistency patterns need to be supported. There are systems optimized for 50% reads 50% writes, 95% writes, or 95% reads. Read-only applications needing extreme speed and resiliency, simple queries, and can tolerate slightly stale data. Applications requiring moderate performance, read/write access, simple queries, completely authoritative data. A read-only application which complex query requirements.

• Load balance to accommodate data and usage concentrations and to help keep microprocessors busy.

• Real-time inserts, updates, and queries.

• Hierarchical data like threaded discussions and parts explosion.

• Dynamic table creation.

• Two-tier applications where low latency data is made available through a fast NoSQL interface, but the data itself can be calculated and updated by high latency Hadoop apps or other low priority apps.

• Sequential data reading. The right underlying data storage model needs to be selected. A B-tree may not be the best model for sequential reads.

• Slicing off part of service that may need better performance/scalability onto its own system. For example, user logins may need to be high performance and this feature could use a dedicated service to meet those goals.

• Caching. A high performance caching tier for websites and other applications. Example is a cache for the Data Aggregation System used by the Large Hadron Collider.Voting.

• Real-time page view counters.

• User registration, profile, and session data.

• Document, catalog management and content management systems. These are facilitated by the ability to store complex documents has a whole rather than organized as relational tables. Similar logic applies to inventory, shopping carts, and other structured data types.

• Archiving. Storing a large continual stream of data that is still accessible on-line. Document-oriented databases with a flexible schema that can handle schema changes over time.

• Analytics. Use MapReduce, Hive, or Pig to perform analytical queries and scale-out systems that support high write loads.

• Working with heterogeneous types of data, for example, different media types at a generic level.

• Embedded systems. They don’t want the overhead of SQL and servers, so they use something simpler for storage.

• A "market" game, where you own buildings in a town. You want the building list of someone to pop up quickly, so you partition on the owner column of the building table, so that the select is single-partitioned. But when someone buys the building of someone else you update the owner column along with price.

• JPL is using SimpleDB to store rover plan attributes. References are kept to a full plan blob in S3.

• Federal law enforcement agencies tracking Americans in real-time using credit cards, loyalty cards and travel reservations.

• Fraud detection by comparing transactions to known patterns in real-time.

• Helping diagnose the typology of tumors by integrating the history of every patient.

• In-memory database for high update situations, like a website that displays everyone's "last active" time (for chat maybe). If users are performing some activity once every 30 sec, then you will be pretty much be at your limit with about 5000 simultaneous users.

• Handling lower-frequency multi-partition queries using materialized views while continuing to process high-frequency streaming data.

• Priority queues.

• Running calculations on cached data, using a program friendly interface, without having to go through an ORM.

• Uniq a large dataset using simple key-value columns.

• To keep querying fast, values can be rolled-up into different time slices.

• Computing the intersection of two massive sets, where a join would be too slow.

• A timeline ala Twitter.

Redis use cases, VoltDB use cases and more find here.

这篇关于各类数据库的实例(真实案例)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！