本文介绍了从元数据中检索数据的方式如何在Glue Script中创建表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 在AWS Glue中,尽管我阅读了文档,但是并没有一件事被清除.以下是我的理解.In AWS Glue, Although I read documentation, but I didn't get cleared one thing. Below is what I understood.关于抓取工具:这将为S3或DynamoDB表创建一个元数据表.但是我不明白的是:Scala/Python脚本如何使用元数据创建的表从实际源"(例如DynamoDB or S3)检索数据.Regarding Crawlers: This will create a metadata table for either S3 or DynamoDB table. But what I don't understand is: how does Scala/Python script able to retrieve data from Actual Source (say DynamoDB or S3) using Metadata created tables.val input = glueContext .getCatalogSource(database = "my_data_base", tableName = "my_table") .getDynamicFrame()以上行是否通过元数据表从实际源中检索数据?Does above line retrieve data from actual source via metadata tables?如果有人能够通过元数据表在Glue脚本中检索数据的幕后工作,我将感到很高兴.I will be glad if someone can able to explain me behind the scenes of retrieving data in Glue script via metadata tables.推荐答案运行Glue搜寻器时,它将从S3或JDBC中获取元数据(取决于您的要求),并在AWS Glue数据目录中创建表.When you run a Glue crawler it will fetch metadata from S3 or JDBC (depends on your requirement) and creates tables in AWS Glue Data Catalog.现在,如果要从Glue ETL作业连接到此数据/表,则可以根据需要以多种方式进行操作:Now if you want to connect to this data/tables from Glue ETL job then you can do it in multiple ways depending on your requirement: [from_options] [1]:如果要直接从S3/JDBC加载而不连接到Glue目录.[from_options][1] : if you want to load directly from S3/JDBC with out connecting to Glue catalog. [from_catalog] [1]:如果要从Glue目录加载数据,则需要使用getCatalogSource方法将其与目录链接,如代码所示.顾名思义,它将使用Glue数据目录作为源并加载传递给此方法的特定表.[from_catalog][1] : If you want to load data from Glue catalog then you need to link it with catalog using getCatalogSource method as shown in your code. As the name infers it will use Glue data catalog as source and load particular table that you pass to this method.一旦查看了指向某个位置的表定义,它将建立连接并加载源中存在的数据.Once it looks at your table definition which is pointed to a location then it will make a connection and load the data present in the source.是的,如果要从Glue目录加载表,则需要使用getCatalogSource.Yes you need to use getCatalogSource if you want to load tables from Glue catalog.目录是否调查Crawler并引用实际的源和加载数据?如果在运行getCatalogSource之前删除了搜寻器,那么在这种情况下我将能够加载数据吗?如果我的来源有大量记录,该怎么办?那么这将加载所有记录,或者在这种情况下如何加载? 这篇关于从元数据中检索数据的方式如何在Glue Script中创建表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-11 07:20