问题描述
我正在尝试使用 hadoop 运行一个简单的 NaiveBayesClassifer
,但出现此错误
I am trying to run a simple NaiveBayesClassifer
using hadoop, getting this error
Exception in thread "main" java.io.IOException: No FileSystem for scheme: file
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.mahout.classifier.naivebayes.NaiveBayesModel.materialize(NaiveBayesModel.java:100)
代码:
Configuration configuration = new Configuration();
NaiveBayesModel model = NaiveBayesModel.materialize(new Path(modelPath), configuration);// error in this line..
modelPath
指向 NaiveBayes.bin
文件,并且配置对象正在打印 - Configuration: core-default.xml, core-site.xml代码>
modelPath
is pointing to NaiveBayes.bin
file, and configuration object is printing - Configuration: core-default.xml, core-site.xml
我认为是因为罐子,有什么想法吗?
I think its because of jars, any ideas?
推荐答案
这是一个典型的maven-assembly
插件破坏案例.
This is a typical case of the maven-assembly
plugin breaking things.
不同的 JARs(hadoop-commons
用于 LocalFileSystem
,hadoop-hdfs
用于 DistributedFileSystem
)每个包含不同的在他们的 META-INFO/services
目录中名为 org.apache.hadoop.fs.FileSystem
的文件.该文件列出了他们想要声明的文件系统实现的规范类名(这称为通过 java.util.ServiceLoader
实现的服务提供者接口,参见 apache.hadoop.FileSystem#loadFileSystems).
Different JARs (hadoop-commons
for LocalFileSystem
, hadoop-hdfs
for DistributedFileSystem
) each contain a different file called org.apache.hadoop.fs.FileSystem
in their META-INFO/services
directory. This file lists the canonical classnames of the filesystem implementations they want to declare (This is called a Service Provider Interface implemented via java.util.ServiceLoader
, see org.apache.hadoop.FileSystem#loadFileSystems
).
当我们使用 maven-assembly-plugin
时,它会将我们所有的 JAR 合并为一个,并且所有 META-INFO/services/org.apache.hadoop.fs.FileSystem
互相覆盖.这些文件中只剩下一个(添加的最后一个).在这种情况下,hadoop-commons
中的 FileSystem
列表覆盖了 hadoop-hdfs
中的列表,因此 DistributedFileSystem
是不再声明.
When we use maven-assembly-plugin
, it merges all our JARs into one, and all META-INFO/services/org.apache.hadoop.fs.FileSystem
overwrite each-other. Only one of these files remains (the last one that was added). In this case, the FileSystem
list from hadoop-commons
overwrites the list from hadoop-hdfs
, so DistributedFileSystem
was no longer declared.
在加载 Hadoop 配置之后,但在执行任何与 FileSystem
相关的操作之前,我们称之为:
After loading the Hadoop configuration, but just before doing anything FileSystem
-related, we call this:
hadoopConfig.set("fs.hdfs.impl",
org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()
);
hadoopConfig.set("fs.file.impl",
org.apache.hadoop.fs.LocalFileSystem.class.getName()
);
更新:正确的修复
krookedking
引起了我的注意,有一种基于配置的方法可以使 maven-assembly
使用所有 的合并版本FileSystem
服务声明,请查看下面的他的回答.
Update: the correct fix
It has been brought to my attention by krookedking
that there is a configuration-based way to make the maven-assembly
use a merged version of all the FileSystem
services declarations, check out his answer below.
这篇关于hadoop 没有用于方案的文件系统:文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!