问题描述
我有一个从 HDFS 上的目录读取数据的 Pig 脚本.数据存储为 avro 文件.文件结构如下:
I have a pig script that reads data from a directory on HDFS. The data are stored as avro files. The file structure looks like:
DIR--
--Subdir1
--Subdir2
--Subdir3
--Subdir4
在 Pig 脚本中,我只是简单地进行加载、过滤和存储.它看起来像:
In the pig script I am simply doing a load, filter and store. It looks like:
items = LOAD path USING AvroStorage()
items = FILTER items BY some property
STORE items into outputDirectory using AvroStorage()
现在的问题是 pig 在输出目录中输出了许多空文件.我想知道是否有办法删除这些文件?谢谢!
The problem right now is that pig is outputting many empty files in the output directory. I am wondering if there's a way to remove those files? Thanks!
推荐答案
对于 pig 版本 0.13 及更高版本,您可以设置 pig.output.lazy=true 以避免创建空文件.(https://issues.apache.org/jira/browse/PIG-3299一>)
For pig version 0.13 and later, you can set pig.output.lazy=true to avoid creating empty files. (https://issues.apache.org/jira/browse/PIG-3299)
这篇关于如何防止Apache Pig输出空文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!