问题描述
我正在一个文件夹上运行爬虫程序,该文件夹包含多个具有不同架构的文件.我希望为每个文件找到一个表格.
I'm running a crawler over a folder containing several files with different schemas. I expect so to find a table for each file.
实际情况是,在 Glue 目录中,我实际上可以看到每个文件的表格,以及它自己的架构.但是当我尝试通过 Redshift Spectrum(在创建外部架构等之后)查询它时,我得到了这个异常:
What happens is that in the Glue Catalogue I can actually see a table for each file, with its own schema. But when I try to query it via Redshift Spectrum (after creating the external schema etc.) I get this exception:
[XX000][500310] [Amazon](500310) Invalid operation: Parsed manifest is not a valid JSON object.
如何解决?
推荐答案
如本论坛中所述 https://forums.aws.amazon.com/thread.jspa?threadID=266510
每个文件都应该在自己的文件夹/子存储桶中
所以对我来说,将每个文件放在自己的文件夹中并将 Glue Crawler 设置为在顶级文件夹上运行解决了异常.
So for me putting each file in its own folder and setting the Glue Crawler to run over the top level folder resolved the exception.
我现在可以毫无问题地查询它.
I'm now able to query it without any problem.
这篇关于[XX000][500310] [Amazon](500310) 无效操作:解析的清单不是有效的 JSON 对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!