问题描述
我正在使用datastax ops center为学校项目复制cassandra nosql数据库中的数据。从我已经阅读的,有三个关键字:集群,节点和数据中心,从我所理解的,一个节点中的数据可以复制在另一个节点,存在于另一个集群。并且包含相同(重复的)数据的所有节点组成数据中心。是否正确?
I am trying to duplicate data in a cassandra nosql database for a school project using datastax ops center. From what I have read, there is three keywords: cluster, node, and datacenter, and from what I have understand, the data in a node can be duplicated in another node, that exists in another cluster. And all the nodes that contains the same (duplicated) data compose a datacenter. Is that right?
如果不是,有什么区别?
If it is not, what is the difference?
推荐答案
节点是运行Cassandra的单个机器。
A node is a single machine that runs Cassandra. A collection of nodes holding similar data are grouped in what is known as a "ring" or cluster.
有时,如果您有大量数据,或者您正在提供服务数据在不同的地理区域,将集群的节点分组到不同的数据中心是有意义的。一个很好的用例是一个电子商务网站,它可能在东海岸和西海岸有许多频繁的客户。这样,您在东海岸的客户就会连接到您的东海岸DC(为了更快的性能),但最终可以访问与西海岸客户相同的数据集(两个DC都在同一个集群中)。
Sometimes if you have a lot of data, or if you are serving data in different geographical areas, it makes sense to group the nodes of your cluster into different data centers. A good use case of this, is for an e-commerce website, which may have many frequent customers on the east coast and the west coast. That way your customers on the east coast connect to your east coast DC (for faster performance), but ultimately have access to the same dataset (both DCs are in the same cluster) as the west coast customers.
有关这方面的详情,请访问:
More information on this can be found here: About Apache Cassandra- How does Cassandra work?
关闭,但不一定。您具有的数据重复级别由复制因素决定,该复制因素是基于每个键空间设置的。例如,假设我在我的单个DC中有3个节点,全部存储600GB的产品数据。我的产品
键空间定义可能如下所示:
Close, but not necessarily. The level of data duplication you have is determined by your replication factor, which is set on a per-keyspace basis. For instance, let's say that I have 3 nodes in my single DC, all storing 600GB of product data. My products
keyspace definition might look like this:
CREATE KEYSPACE products
WITH replication = {'class': 'NetworkTopologyStrategy', 'MyDC': '3'};
这将确保我的产品数据平等地复制到所有3个节点。我的总数据集的大小是600GB,在所有3个节点上重复。
This will ensure that my product data is replicated equally to all 3 nodes. The size of my total dataset is 600GB, duplicated on all 3 nodes.
但是我们说,我们正在推出一个新的,相当大的产品线,估计我们将有另外300GB的数据来,这可能开始推动我们的硬盘驱动器的最大容量。如果我们现在不能升级所有的硬盘驱动器,我可以改变这样的复制因素:
But let's say that we're rolling-out a new, fairly large product line, and I estimate that we're going to have another 300GB of data coming, which may start pushing the max capacity of our hard drives. If we can't afford to upgrade all of our hard drives right now, I can alter the replication factor like this:
CREATE KEYSPACE products
WITH replication = {'class': 'NetworkTopologyStrategy', 'MyDC': '2'};
这将创建我们所有数据的2个副本,并将其存储在我们当前的3个节点。我们的数据集的大小现在是900GB,但由于只有它的两个副本(每个节点本质上负责2/3的数据),我们的磁盘上的大小仍然是600GB。这里的缺点是(假设我在 ONE
的一致性级别读取和写入)我只能承受损失1个节点。而3个节点和3的RF(再次读写一致 ONE
),我可以失去2个节点,仍然提供请求。
This will create 2 copies of all of our data, and store it in our current cluster of 3 nodes. The size of our dataset is now 900GB, but since there are only two copies of it (each node is essentially responsible for 2/3 of the data) our size on-disk is still 600GB. The drawback here, is that (assuming I read and write at a consistency level of ONE
) I can only afford to suffer a loss of 1 node. Whereas with 3 nodes and a RF of 3 (again reading and writing at consistency ONE
), I could lose 2 nodes and still serve requests.
这篇关于Cassandra nosql数据库中的节点,集群和数据中心之间的区别是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!