问题描述
import java.io.InputStream;}使用java代码访问haddop文件时出现堆栈溢出错误。
import java.net.URL;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.io.IOUtils;
public class URLCat
{
static
{
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
public static void main(String [] args)throws Exception
{
InputStream in = null;
尝试
{
in = new URL(args [0])。openStream();
IOUtils.copyBytes(in,System.out,4096,false);
}
finally
{
IOUtils.closeStream(in);
$ / code $ / pre
我使用eclipse来调试这个代码然后我开始知道行 in = new URL(args [0])。openStream();
产生错误。
我是runnung这段代码通过传递hadoop文件路径即bb
$ b $ $ p $ h $ c $ h $ c $ hdfs://localhost/user/jay/abc.txt
异常(摘自评论):
线程main中的异常java.lang.StackOverflowError $ java.nio.Buffer中的
。< init>(Buffer.java:174)
at java.nio.ByteBuffer。< init>(ByteBuffer.java:259)$ java.util.HeapByteBuffer中的
。< init>(HeapByteBuffer.java:52)
at java.nio.ByteBuffer。 wrap(ByteBuffer.java:350)
在java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
在java.lang.StringCoding $ StringEncoder.encode(StringCoding.java:237)
at java.lang.StringCoding.encode(StringCoding.java:272)
at java.lang.String.getBytes(String.java:946)
at java.io.UnixFileSystem.getBooleanAttributes0(Native方法)
..堆栈跟踪截断..
URL对象的工作
当我们使用任何一个构造函数创建一个新的URL而不传递URLStreamHandler(通过为其值传递null或调用不以URLStreamHandler对象为参数的构造函数)时,它会在内部调用一个名为getURLStreamHandler的方法)。此方法返回URLStreamHandler对象,并在URL类中设置成员
变量。
这个对象知道如何构建像http,file等特定方案的连接。这个URLStreamHandler由工厂构建,名为
URLStreamHandlerFactory。
$ b $ 3在上面给出的问题示例中,URLStreamHandlerFactory通过调用以下静态方法将其设置为FsUrlStreamHandlerFactory。
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
所以当我们创建一个新的URL时,这个FSUrlStreamHandlerFactory用于为此创建URLStreamHandler对象通过调用它的createURLStreamHandler(protocol)方法来创建新的URL。
这个方法inturn调用FileSystem类的loadFileSystems()方法。 loadFileSystems()方法调用ServiceLoader.load(FileSystem.class),以便通过搜索classpath中所有jar文件的所有META-INF / services / *。FileSystem文件来尝试读取FileSystem实现类的二进制名称,阅读其条目。 4)请记住,每个jar都是作为URL对象来处理的,这意味着每个jar都由一个ClassLoader在内部创建一个URL对象。类加载器在为这些jar构建URL时提供URLStreamHandler对象
,所以这些URL不会受我们设置的FSUrlStreamHandlerFactory的影响,因为URL已经具有的URLStreamHandler。由于我们是处理jar文件的
,类加载器设置类型为sun.net.www.protocol.jar.Handler的URLStreamHandler。 5)现在为了读取FileSystem实现类的jar文件内的条目,sun.net.www.protocol.jar.Handler需要构造通过
调用每个条目的URL对象,而不使用URLStreamHandler对象调用URL构造函数。由于我们已经将URLStreamHandlerFactory定义为FSUrlStreamHandlerFactory,因此它会调用createURLStreamHandler
$ b
(protocol)方法,这会导致不规则递归并导致StackOverflowException。
这个错误被Hadoop提交者称为HADOOP-9041。链接是 https://issues.apache.org/jira/browse/HADOOP-9041 我知道这有些复杂。 所以简而言之,解决这个问题的办法是1)使用最新的jar hadoop-common-2.0.0-cdh4.2.1.jar,这个bug已经修复了。
或
在设置URLStreamHandlerFactory之前,将以下语句放入静态块中。
static {
FileSystem.getFileSystemClass(file,new Configuration());
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
$ / code>
请注意,静态块中的第一条语句现在不依赖于FsUrlStreamHandlerFactory并使用file://的默认处理程序读取META-INF / services / *。FileSystem文件中的文件名。
I am getting stack overflow error while accessing haddop file using java code.
import java.io.InputStream;
import java.net.URL;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.io.IOUtils;
public class URLCat
{
static
{
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
public static void main(String[] args) throws Exception
{
InputStream in = null;
try
{
in = new URL(args[0]).openStream();
IOUtils.copyBytes(in, System.out, 4096, false);
}
finally
{
IOUtils.closeStream(in);
}
}
}
i used eclipse to debug this code then i came to know line
in = new URL(args[0]).openStream();
producing error.
I am runnung this code by passing hadoop file path i.e
hdfs://localhost/user/jay/abc.txt
Exception (pulled from comments) :
Exception in thread "main" java.lang.StackOverflowError
at java.nio.Buffer.<init>(Buffer.java:174)
at java.nio.ByteBuffer.<init>(ByteBuffer.java:259)
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:52)
at java.nio.ByteBuffer.wrap(ByteBuffer.java:350)
at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:237)
at java.lang.StringCoding.encode(StringCoding.java:272)
at java.lang.String.getBytes(String.java:946)
at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
.. stack trace truncated ..
1) This is because of the bug in the FSURLStreamHandlerFactory class provided by hadoop. Please note that the bug is fixed in the latest jar which contains this class.
2) This file is located in hadoop-common-2.0.0-cdh4.2.1.jar. To understand the problem completely we have to understand how the java.net.URL class works.
Working of URL object
When we create a new URL using any one of its constructor without passing "URLStreamHandler" (either through passing null for its value or calling constructor which does not take URLStreamHandler object as its parameter) then internally it calls a method called getURLStreamHandler(). This method returns the URLStreamHandler object and sets a member
variable in URL class.
This object knows how to construct a connection of a particular scheme like "http", "file"... and so on. This URLStreamHandler is constructed by the factory called
URLStreamHandlerFactory.
3) In the problem example given above the URLStreamHandlerFactory was set to "FsUrlStreamHandlerFactory" by calling the following static method.
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
So when we create a new URL then this "FSUrlStreamHandlerFactory" is used to create the URLStreamHandler object for this new URL by calling its createURLStreamHandler(protocol) method.
This method inturn calls a method called loadFileSystems() of FileSystem class. The loadFileSystems() method invokes the ServiceLoader.load("FileSystem.class") so it tries to read the binary names of the FileSystem implementation classes by searching all META-INF/services/*.FileSystem files of all jar files in classpath and reading its entries.
4) Remember that the each jar is handled as URL object meaning for each jar an URL object is created by the ClassLoader internally. The class loader supplies the URLStreamHandler object
when constructing the URL for these jars so these URLs will not be affected by the "FSUrlStreamHandlerFactory" we set because the URL has already having the "URLStreamHandler". Since we are
dealing with jar files the class loader sets the "URLStreamHandler" as of type "sun.net.www.protocol.jar.Handler".
5) Now inorder to read the entries inside the jar files for the FileSystem implementation classes the "sun.net.www.protocol.jar.Handler" needs to construct the URL object for each entry by
calling the URL constructor without the URLStreamHandler object. Since we already defined the URLStreamHandlerFactory as "FSUrlStreamHandlerFactory" it calls the createURLStreamHandler
(protocol) method which causes to recurse indefinetly and lead to the "StackOverflowException".
This bug is known as the "HADOOP-9041" by the Hadoop committters. The link is https://issues.apache.org/jira/browse/HADOOP-9041.
I know this is somewhat complicated.
So in short the solution to this problem is given below.
1) Use the latest jar hadoop-common-2.0.0-cdh4.2.1.jar which has the fix for this bug
or
2) Put the following statement in the static block before setting the URLStreamHandlerFactory.
static {
FileSystem.getFileSystemClass("file",new Configuration());
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
Note that the first statement inside the static block doesn't depend on FsUrlStreamHandlerFactory now and uses the default handler for file:// to read the file entires in META-INF/services/*.FileSystem files.
这篇关于在hadoop中获取堆栈溢出错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!