使用 Flink Rich InputFormat 创建 Elasticsearch 的输入格式

本文介绍了使用 Flink Rich InputFormat 创建 Elasticsearch 的输入格式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们使用的是 Elasticsearch 6.8.4 和 Flink 1.0.18.

We are using Elasticsearch 6.8.4 and Flink 1.0.18.

我们在 elasticsearch 中有一个包含 1 个分片和 1 个副本的索引，我想创建自定义输入格式以使用具有 1 个以上输入拆分的 apache Flink 数据集 API 在 elasticsearch 中读取和写入数据，以实现更好的性能.那么有什么办法可以达到这个要求吗?

We have an index with 1 shard and 1 replica in elasticsearch and I want to create the custom input format to read and write data in elasticsearch using apache Flink dataset API with more than 1 input splits in order to achieve better performance. so is there any way I can achieve this requirement?

注意:每个文档的大小更大(将近 8mb)，由于大小限制和每个阅读请求，我一次只能读取 10 个文档，我们想要检索 500k 记录.

Note: Per document size is larger(almost 8mb) and I can read only 10 documents at a time because of size constraint and per reading request, we want to retrieve 500k records.

根据我的理解，no.of parallelism 应该等于数据源的分片/分区数.然而，由于我们只存储了少量数据，因此我们将分片的数量保持为 1，并且我们有一个静态数据，它每月略有增加.

As per my understanding, no.of parallelism should be equal to number of shards/partitions of the data source. however, since we store only a small amount of data we have kept the number of shard as only 1 and we have a static data it gets increased very slightly per month.

非常感谢任何帮助或源代码示例.

Any help or example of source code will be much appreciated.

shard

使用 Flink Rich InputFormat 创建 Elasticsearch 的输入格式

问题描述

推荐答案