问题描述
我有一个Pig脚本,可以从HDFS上的目录读取数据.数据存储为avro文件.文件结构如下:
I have a pig script that reads data from a directory on HDFS. The data are stored as avro files. The file structure looks like:
DIR--
--Subdir1
--Subdir2
--Subdir3
--Subdir4
在Pig脚本中,我只是在进行加载,过滤和存储.看起来像:
In the pig script I am simply doing a load, filter and store. It looks like:
items = LOAD path USING AvroStorage()
items = FILTER items BY some property
STORE items into outputDirectory using AvroStorage()
现在的问题是Pig在输出目录中输出许多空文件.我想知道是否有办法删除这些文件?谢谢!
The problem right now is that pig is outputting many empty files in the output directory. I am wondering if there's a way to remove those files? Thanks!
推荐答案
对于Pig版本0.13和更高版本,可以设置pig.output.lazy = true以避免创建空文件. ( https://issues.apache.org/jira/browse/PIG-3299)
For pig version 0.13 and later, you can set pig.output.lazy=true to avoid creating empty files. (https://issues.apache.org/jira/browse/PIG-3299)
这篇关于如何防止Apache Pig输出空文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!