本文介绍了带双引号和逗号的AWS Glue问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有这个CSV文件:
reference,address
V7T452F4H9,"12410 W 62TH ST, AA D"
表定义中使用了以下选项
The following options are being used in the table definition
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'quoteChar'='\"',
'separatorChar'=',')
但还是赢了不能识别数据中的双引号,并且双引号字段中的逗号弄乱了数据。当我运行Athena查询时,结果看起来像这样
but it still won't recognize the double quotes in the data, and that comma in the double quote fiel is messing up the data. When I run the Athena query, the result looks like this
reference address
V7T452F4H9 "12410 W 62TH ST
如何解决此问题?
推荐答案
看起来您还需要添加 escapeChar
。 显示了以下示例:
Look like you also need to add escapeChar
. AWS Athena docs shows this example:
CREATE EXTERNAL TABLE myopencsvtable (
col1 string,
col2 string,
col3 string,
col4 string
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\"',
'escapeChar' = '\\'
)
STORED AS TEXTFILE
LOCATION 's3://location/of/csv/';
这篇关于带双引号和逗号的AWS Glue问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!