mysql - 使用 AWS Glue 覆盖 MySQL 表

我有一个 lambda 进程，它偶尔会轮询 API 以获取最近的数据。该数据具有唯一键，我想使用Glue更新MySQL中的表。是否可以选择使用此 key 覆盖数据？ (类似于 Spark 的模式=覆盖)。如果没有 - 我是否可以在插入所有新数据之前截断 Glue 中的表格？

谢谢

最佳答案

我发现在Glue中使用JDBC连接的更简单方法。当您将数据写入Redshift集群时，Glue团队建议截断表的方式是通过以下示例代码进行的:

datasink5 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = resolvechoice4, catalog_connection = "<connection-name>", connection_options = {"dbtable": "<target-table>", "database": "testdb", "preactions":"TRUNCATE TABLE <table-name>"}, redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink5")

在哪里

connection-name your Glue connection name to your Redshift Cluster
target-table    the table you're loading the data in
testdb          name of the database
table-name      name of the table to truncate (ideally the table you're loading into)

关于mysql - 使用 AWS Glue 覆盖 MySQL 表，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/47556678/

Table

mysql - 使用 AWS Glue 覆盖 MySQL 表