本文介绍了如何在 Apache Flink 中将检查点存储到远程 RocksDB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道Apache Flink中有3种状态后端:MemoryStateBackend、FsStateBackend和RocksDBStateBackend.

I know that there are three kinds of state backends in Apache Flink: MemoryStateBackend, FsStateBackend and RocksDBStateBackend.

MemoryStateBackend 将检查点存储到本地 RAM,FsStateBackend 将检查点存储到本地 FileSystem,RocksDBStateBackend 将检查点存储到 RocksDB.我有一些关于 RocksDBStateBackend 的问题.

MemoryStateBackend stores the checkpoints into local RAM, FsStateBackend stores the checkpoints into local FileSystem, and RocksDBStateBackend stores the checkpoints into RocksDB. I have some questions about the RocksDBStateBackend.

据我了解,RocksDBStateBackend的机制已经嵌入到Apache Flink中.RocksDB 是一种键值数据库.所以如果我是对的,这意味着 Flink 会将所有检查点存储到使用本地磁盘的嵌入式 RocksDB 中.

As my understanding, the mechanism of RocksDBStateBackend has been embedded into Apache Flink. The rocksDB is a kind of key-value DB. So If I'm right, it means that Flink will store all checkpoints into the embedded rocksDB, which uses the local disk.

如果是这样,我认为在某些情况下磁盘可能会因为存储在 RocksDB 中的检查点而耗尽.现在我在想是否可以配置一个远程rocksDB来存储这些检查点?如果可能,我们应该担心远程rocksDB崩溃吗?如果远程rocksDB崩溃了,Flink的作业就不能继续工作了,对吧?

If so, I think the disk could be exhausted in some cases because of the checkpoints stored into the rocksDB. Now I'm thinking if it is possible to configure a remote rocksDB to store these checkpoints? If it is possible, should we worry about the remote rocksDB crashing? If the remote rocksDB crashes, the jobs of Flink can not continue working, right?

推荐答案

没有选项可以将外部或远程 RocksDB 与 Apache Flink 一起使用.RocksDB 是一个嵌入式键值存储,在每个任务管理器中都有一个本地实例.

There is no option to use an external or remote RocksDB with Apache Flink. RocksDB is an embedded key-value store with a local instance in each task manager.

几点:

  • Flink 在工作状态(始终是本地的(为了获得良好性能)和状态快照(检查点和保存点)之间有很大的区别),它们不是本地的(为了可靠性,它们应该存储在分布式文件中)系统).

  • Flink makes a strong distinction between the working state, which is always local (for good performance), and state snapshots (checkpoints and savepoints), which are not local (for reliability they should be stored in a distributed file system).

RocksDBStateBackend 使用本地磁盘进行工作状态.另外两个状态后端在 Java 堆上保持其工作状态.

The RocksDBStateBackend uses the local disk for working state. The other two state backends keep their working state on the Java heap.

检查点协调器将分散在所有任务管理器中的所有这些数据切片收集在一起,形成完整的检查点,并存储在其他地方.在 MemoryStateBackend 的情况下,这些检查点存储在 JobManager 堆中;对于另外两个,它们位于分布式文件系统中.

The checkpoint coordinator arranges for all of these slices of data scattered across all of the task managers to be collected together into complete checkpoints that are stored elsewhere. In the case of the MemoryStateBackend those checkpoints are stored on the JobManager heap; for the other two, they are in a distributed file system.

您想配置 RocksDB 以使用最快的可用本地文件系统.尽量使用本地连接的SSD,避免网络连接存储(如EBS).请勿尝试使用 S3 等分布式文件系统作为 RocksDB 的本地存储.

You want to configure RocksDB to use the fastest available local file system. Try to use locally attached SSDs, and avoid network-attached storage (such as EBS). Do not try to use a distributed file system such as S3 as RocksDB's local storage.

state.backend.rocksdb.localdir 控制每个本地 RocksDB 存储其工作状态的位置.

state.backend.rocksdb.localdir controls where each local RocksDB stores its working state.

RocksDBStateBackend 构造函数的参数控制检查点的存储位置.例如,@ezequiel 推荐的使用 S3 是 AWS 上的明显选择.

The parameter to the RocksDBStateBackend constructor controls where the checkpoints are stored. E.g., using S3 as recommended by @ezequiel is the obvious choice on AWS.

这篇关于如何在 Apache Flink 中将检查点存储到远程 RocksDB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-18 22:16