本文介绍了hadoop 没有用于方案的文件系统:文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 hadoop 运行一个简单的 NaiveBayesClassifer,但出现此错误

I am trying to run a simple NaiveBayesClassifer using hadoop, getting this error

Exception in thread "main" java.io.IOException: No FileSystem for scheme: file
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at org.apache.mahout.classifier.naivebayes.NaiveBayesModel.materialize(NaiveBayesModel.java:100)

代码:

    Configuration configuration = new Configuration();
    NaiveBayesModel model = NaiveBayesModel.materialize(new Path(modelPath), configuration);// error in this line..

modelPath 指向 NaiveBayes.bin 文件,并且配置对象正在打印 - Configuration: core-default.xml, core-site.xml

modelPath is pointing to NaiveBayes.bin file, and configuration object is printing - Configuration: core-default.xml, core-site.xml

我认为是因为罐子,有什么想法吗?

I think its because of jars, any ideas?

推荐答案

这是一个典型的maven-assembly插件破坏案例.

This is a typical case of the maven-assembly plugin breaking things.

不同的 JARs(hadoop-commons 用于 LocalFileSystemhadoop-hdfs 用于 DistributedFileSystem)每个包含不同的在他们的 META-INFO/services 目录中名为 org.apache.hadoop.fs.FileSystem 的文件.该文件列出了他们想要声明的文件系统实现的规范类名(这称为通过 java.util.ServiceLoader 实现的服务提供者接口,参见 apache.hadoop.FileSystem#loadFileSystems).

Different JARs (hadoop-commons for LocalFileSystem, hadoop-hdfs for DistributedFileSystem) each contain a different file called org.apache.hadoop.fs.FileSystem in their META-INFO/services directory. This file lists the canonical classnames of the filesystem implementations they want to declare (This is called a Service Provider Interface implemented via java.util.ServiceLoader, see org.apache.hadoop.FileSystem#loadFileSystems).

当我们使用 maven-assembly-plugin 时,它会将我们所有的 JAR 合并为一个,并且所有 META-INFO/services/org.apache.hadoop.fs.FileSystem 互相覆盖.这些文件中只剩下一个(添加的最后一个).在这种情况下,hadoop-commons 中的 FileSystem 列表覆盖了 hadoop-hdfs 中的列表,因此 DistributedFileSystem 是不再声明.

When we use maven-assembly-plugin, it merges all our JARs into one, and all META-INFO/services/org.apache.hadoop.fs.FileSystem overwrite each-other. Only one of these files remains (the last one that was added). In this case, the FileSystem list from hadoop-commons overwrites the list from hadoop-hdfs, so DistributedFileSystem was no longer declared.

在加载 Hadoop 配置之后,但在执行任何与 FileSystem 相关的操作之前,我们称之为:

After loading the Hadoop configuration, but just before doing anything FileSystem-related, we call this:

    hadoopConfig.set("fs.hdfs.impl",
        org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()
    );
    hadoopConfig.set("fs.file.impl",
        org.apache.hadoop.fs.LocalFileSystem.class.getName()
    );

更新:正确的修复

krookedking 引起了我的注意,有一种基于配置的方法可以使 maven-assembly 使用所有 的合并版本FileSystem 服务声明,请查看下面的他的回答.

Update: the correct fix

It has been brought to my attention by krookedking that there is a configuration-based way to make the maven-assembly use a merged version of all the FileSystem services declarations, check out his answer below.

这篇关于hadoop 没有用于方案的文件系统:文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 16:35