本文介绍了将输出写入不同的文件夹hadoop的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





  • 我想从同一个reducer写入两种不同类型的输出,我可以使用hadoop中的多输出功能来写入不同的文件,但它们都会到同一个输出文件夹。



    我想从同样可以减少到不同的文件夹。



    有没有办法做到这一点?



    例如hello / testfile,作为第二个参数,它显示了入侵参数。所以我不能写入不同的文件夹。


    1. 如果上述情况不可行,映射器是否可以读取只有来自输入文件夹的特定文件?

    请帮助我。

    提前致谢!






    感谢您的回复。我可以使用上述方法成功读取文件。但在分布式模式下,我无法这样做。在reducer中,我有
    集合:

    mos.getCollector(data,记者).collect(new Text(str_key ),new Text(str_val));



    (使用多个输出,并且在Job Conf中:
    我试过使用

    FileInputFormat.setInputPaths(conf2,/ home / users / mlakshm / opchk285 / data-r-00000 *);



    以及

    FileInputFormat.setInputPaths(conf2,/ home / users / mlakshm / opchk285 / data *);



    但是,它给出了以下错误:

     原因:org.apache.hadoop.mapred.InvalidInputException:输入模式hdfs://mentat.cluster:54310 / home / users / mlakshm / opchk295 / data-r -00000 *匹配0个文件


    解决方案

    将MultipleOutputs代码复制到你的代码库,并放宽允许字符的限制。我无法看到任何有效的限制原因。


    1. I want to write two different types of output from the same reducer, into two different directories.

    I am able to use multipleoutputs feature in hadoop to write to different files, but they both go to the same output folder.

    I want to write each file from the same reduce to a different folder.

    Is there a way for doing this?

    If I try putting for example "hello/testfile", as the second argument, it shows invaid argument. So I m not able to write to different folders.

    1. If the above case is not possible, the is it possible for the mapper to read only specific files from an input folder?

    Please help me.

    Thanks in advance!


    Thanks for the reply. I am able to read a file successfully using then above method. But in distributed mode, I am not able to do so. In the reducer, I haveset:

    mos.getCollector("data", reporter).collect(new Text(str_key), new Text(str_val));

    (Using multiple outputs, and in Job Conf:I tried using

    FileInputFormat.setInputPaths(conf2, "/home/users/mlakshm/opchk285/data-r-00000*");

    as well as

    FileInputFormat.setInputPaths(conf2, "/home/users/mlakshm/opchk285/data*");

    But, it gives the following error:

    cause:org.apache.hadoop.mapred.InvalidInputException: Input Pattern hdfs://mentat.cluster:54310/home/users/mlakshm/opchk295/data-r-00000* matches 0 files
    
    解决方案

    Copy the MultipleOutputs code into your code base and loosen the restriction on allowable characters. I can't see any valid reason for the restrictions anyway.

    这篇关于将输出写入不同的文件夹hadoop的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

  • 08-24 03:56