本文介绍了如何防止Apache Pig输出空文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Pig脚本,可以从HDFS上的目录读取数据.数据存储为avro文件.文件结构如下:

I have a pig script that reads data from a directory on HDFS. The data are stored as avro files. The file structure looks like:

DIR--
   --Subdir1
   --Subdir2
   --Subdir3
   --Subdir4

在Pig脚本中,我只是在进行加载,过滤和存储.看起来像:

In the pig script I am simply doing a load, filter and store. It looks like:

items = LOAD path USING AvroStorage()
items = FILTER items BY some property
STORE items into outputDirectory using AvroStorage()

现在的问题是Pig在输出目录中输出许多空文件.我想知道是否有办法删除这些文件?谢谢!

The problem right now is that pig is outputting many empty files in the output directory. I am wondering if there's a way to remove those files? Thanks!

推荐答案

对于Pig版本0.13和更高版本,可以设置pig.output.lazy = true以避免创建空文件. ( https://issues.apache.org/jira/browse/PIG-3299)

For pig version 0.13 and later, you can set pig.output.lazy=true to avoid creating empty files. (https://issues.apache.org/jira/browse/PIG-3299)

这篇关于如何防止Apache Pig输出空文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 05:11