如何在Hadoop中使用CompressionCodec

如何在Hadoop中使用CompressionCodec

本文介绍了如何在Hadoop中使用CompressionCodec的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做以下压缩o / p文件从reducer:

  OutputStream out = ipFs.create路径(opDir +/+ fileName)); 
CompressionCodec codec = new GzipCodec();
OutputStream cs = codec.createOutputStream(out);
BufferedWriter cout = new BufferedWriter(new OutputStreamWriter(cs));
cout.write(...)

但在第3行中有空指针异常:

  java.lang.NullPointerException 
at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded ZlibFactory.java:63)
at org.apache.hadoop.io.compress.GzipCodec.createOutputStream(GzipCodec.java:92)
at myFile $ myReduce.reduce(myFile.java:354)

我也有以下。


解决方案

你应该使用CompressionCodecFactory如果你想要使用标准OutputFormat处理之外的压缩(如@linker答案中所述):

  CompressionCodecFactory ccf = new CompressionCodecFactory )
CompressionCoec codec = ccf.getCodecByClassName(GZipCodec.class.getName());
OutputStream compressedOutputSream = codec.createOutputStream(outputStream)


I am doing following to do compression of o/p files from reducer:

OutputStream out = ipFs.create( new Path( opDir + "/" + fileName ) );
CompressionCodec codec = new GzipCodec();
OutputStream cs = codec.createOutputStream( out );
BufferedWriter cout = new BufferedWriter( new OutputStreamWriter( cs ) );
cout.write( ... )

But got null pointer exception in line 3:

java.lang.NullPointerException
    at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63)
    at org.apache.hadoop.io.compress.GzipCodec.createOutputStream(GzipCodec.java:92)
    at myFile$myReduce.reduce(myFile.java:354)

I also got following JIRA for the same.

Can you please suggest if I am doing something wrong?

解决方案

You should use the CompressionCodecFactory if you want to use compression outside of the standard OutputFormat handling (as detailed in @linker answer):

CompressionCodecFactory ccf = new CompressionCodecFactory(conf)
CompressionCoec codec = ccf.getCodecByClassName(GZipCodec.class.getName());
OutputStream compressedOutputSream = codec.createOutputStream(outputStream)

这篇关于如何在Hadoop中使用CompressionCodec的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 03:11