本文介绍了从AWS Glue读取Netezza时的连接超时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用AWS Glue将数据从本地Netezza数据库提取到S3中.我到目前为止编写的代码(不完整)

I am trying to use AWS Glue for pulling data from my on-premise Netezza database into S3. The code I have written so far (not complete)

df = glueContext.read.format("jdbc")\
    .option("driver", "org.netezza.Driver")\
    .option("url", "jdbc:netezza://NetezzaHost01:5480/Netezza_DB")\
    .option("dbtable", "ADMIN.table1")\
    .option("user", "myUser")\
    .option("password", "myPassword")\
    .load()

print(df.count())

我使用的是自定义的JDBC驱动程序jar,因为AWS Glue本身不支持Netezza(该驱动程序由IBM提供),并且在触发作业作为依赖时将其指定.

I am using a custom JDBC driver jar since AWS Glue does not support Netezza natively (the driver is provided by IBM) and specifying it while triggering the job as a Dependency.

此代码不断失败,并显示超时错误:

This code keeps failing with a timeout error:

py4j.protocol.Py4JJavaError: An error occurred while calling o68.load.
: org.netezza.error.NzSQLException: Connection timed out (Connection timed out)

我尝试过的一些方法不起作用:-用火花代替胶水阅读-使用一个很小的表(

A few things I have tried which did not work:- Use spark instead of glue to read- Use a very small table (<100 rows) as source

我应该补充一点,即Netezza数据库位于公司防火墙之后,但是在使用自定义驱动程序时,我看不到任何用于指定安全组的选项(就像对Glue本机连接所做的那样).

I should add that the Netezza database is behind a corporate firewall, but I do not see any options to specify security groups (as you can do with Glue native connections) when using custom drivers.

有什么想法吗?

推荐答案

1)如果您尝试访问内部部署的netezza主机,则首先需要验证您是否能够从您所访问的VPC上访问netezza.选择了您的胶水工作.

1) If you are trying to access the netezza host that is on prem, you first need to validate that you are able to reach netezza from the VPC that you have chosen for your glue job.

2)这引起了一个问题,因为VPC是根据您添加到胶水上的连接选择的,因此显然没有提到netezza被支持.但是,您仍然可以输入netezza网址并进行设置.该测试可能无法进行,但是至少您可以选择一个子网和sec-group.您的sec组应该打开netezza端口

2) This poses a problem since the VPC is chosen on the basis of the connection you add to glue, whcih apparantly does not mention netezza as being supported. However you can still enter the netezza url and set it up.The test might not work, however at least you would be able to choose a subnet and sec-group of your choosing. Your sec group should open up the netezza port

3)我猜您的vpc具有直接连接/vpn设置到办公室网络的功能.只要您的防火墙接受您添加到粘合作业中的子网的CIDR范围内的连接,它就可以正常工作.您可能需要请管理netezza防火墙的团队从您的VPC/子网ip-range打开连接

3) Im guessing your vpc has direct connect/vpn setup to your office network. As long as your firewall accepts connections from the CIDR range of your subnet that you have added to your glue job, it should work. You might need to ask the team that manages the firewall for netezza, to open up connections from your VPC/subnet ip-range

这篇关于从AWS Glue读取Netezza时的连接超时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-13 04:44