问题描述
我在不同的数据中心有 2 个 cassandra 集群(请注意,这些是 2 个不同的集群,而不是具有 multidc 的单个集群),并且两个集群具有相同的键空间和列族模型.我希望以最有效的方式将列族 C 的数据从集群 A 复制到集群 B.我可以使用 get 和 put 操作复制其他一些 ColumnFamily,因为它是一个时间序列并且键是顺序的.但是这个其他的列族 C,我可以复制.我正在使用节俭和 pycassa.我已经尝试了 CQL COPY 命令,但不幸的是 CF 太大了,我得到了 rpc_timeout.我怎样才能做到这一点?
I Have 2 cassandra clusters, on different datacenter (note that these are 2 different clusters, NOT a single cluster with multidc), and both clusters have the same keyspace and columnfamily models. I wish to copy data of columnfamily C from Cluster A to cluster B in the most efficient way. Some other ColumnFamily I was able to copy with get and put operations, since it was a time series and the keys sequential. But this other column family C, I coulnd copy. I'm using thrift and pycassa. I've ried the CQL COPY command, but unfortunately the CF is too large and I get a rpc_timeout. How can I accomplish this?
推荐答案
如果您只想一次性完成此操作,请拍摄快照并使用 sstableloader 将其加载到集群中.如果您想随着时间的推移不断加载新数据,您将需要打开增量备份,然后拍摄快照以加载初始数据,然后定期从增量备份中抓取 sstables 到 sstableload 以保持最新状态.
If you just want to do this as a one time thing, then take a snapshot and use the sstableloader to load that into the cluster. If you want to keep loading new data over time you will want to turn on incremental_backups, then take a snapshot to load for the initial data, and then periodically grab the sstables out of the incremental backups to sstableload to keep things up to date.
这篇关于如何将 cassandra 数据从一个集群复制到另一个集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!