假设我有一个如下所示的模式
{
"name": "phoneNumber",
"type": {
"type": "record",
"name": "internalNumber",
"namespace": "com.wiki",
"fields": [{
"name": "areacode",
"type": "string",
}, {
"name": "phone",
"type": ["null", "string"],
"doc": "Acutal full number",
"default": null
}]
}
}
我有一个csv,该数据分散到多个列中,例如:
areaCode phoneNumber
+1 1234512345
我如何从 pig 脚本中获取avro文件,例如:
"phoneNumber" : {
"areacode" : "+1",
"phone" : "1234512345"
}
自其嵌套。
最佳答案
A = LOAD 'path' USING CSVLoader as (areaCode: chararray, phoneNumber: chararray);
B = foreach A generate (areaCode, phoneNumber as phone) as phoneNumber;
STORE B INTO 'path' using AvroStorage;
您将需要来自piggybank的csvloader和avrostorage
关于hadoop - 如何将数据从csv映射到嵌套的avro模式,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/41134574/