一、问题现象,使用flink on yarn 模式,写入数据到clickhouse,但是在yarn 集群充足的情况下一直报:Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster,表面现象是 yarn 集群资源可能不够,实际yarn 集群资源是够用的。
查看flink jobmanager的日志,发现日志中一直在出现如下报错:
Could not resolve ResourceManager address akka.tcp://[email protected]:38121/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://xxxxxxx.cn:38121/user/rpc/resourcemanager_*.
从这个日志来看,也就基本可以确定不是yarn集群资源的问题,是yarn 集群通信出现了问题。
1)、交叉验证,发现提交别的flink streamling 任务都不会存在该问题,只有写clickhouse的时候才会出现该问题,初步排除可能是代码问题或者该任务的jar包引起的。
2)、查看pom依赖:
<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-jdbc_2.11</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>${flink.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka_2.11</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>${flink.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>ru.yandex.clickhouse</groupId> <artifactId>clickhouse-jdbc</artifactId> <version>${clickhouse-jdbc.version}</version> </dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql-connector-java.version}</version>
</dependency>
08-14 18:48