我在带有Spark的Alluxio上遇到一个奇怪的错误。我用Alluxio的Spark读取了20.000个文件,并且可以正常工作。但是我用Alluxio的Spark读取了40.000个文件,但它不起作用。我使用Alluxio 1.2,Spark 1.6.0,并且使用文件API读取数据:FileSystem fs = FileSystem.Factory.get(); AlluxioURI path = new AlluxioURI(/partition0); ...
16/08/19 16:08:40 INFO logger.type: Client registered with FileSystemMasterClient master @ master/127.0.0.1:19998
16/08/19 16:08:41 ERROR logger.type: Frame size (17277505) larger than max length (16777216)!
org.apache.thrift.transport.TTransportException: Frame size (17277505) larger than max length (16777216)!
at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at alluxio.thrift.FileSystemMasterClientService$Client.recv_listStatus(FileSystemMasterClientService.java:503)
at alluxio.thrift.FileSystemMasterClientService$Client.listStatus(FileSystemMasterClientService.java:489)
at alluxio.client.file.FileSystemMasterClient$8.call(FileSystemMasterClient.java:220)
at alluxio.client.file.FileSystemMasterClient$8.call(FileSystemMasterClient.java:216)
at alluxio.AbstractClient.retryRPC(AbstractClient.java:324)
at alluxio.client.file.FileSystemMasterClient.listStatus(FileSystemMasterClient.java:216)
at alluxio.client.file.BaseFileSystem.listStatus(BaseFileSystem.java:195)
at alluxio.client.file.BaseFileSystem.listStatus(BaseFileSystem.java:186)
at Main.main(Main.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" java.io.IOException: Failed after 32 retries.
at alluxio.AbstractClient.retryRPC(AbstractClient.java:334)
at alluxio.client.file.FileSystemMasterClient.listStatus(FileSystemMasterClient.java:216)
at alluxio.client.file.BaseFileSystem.listStatus(BaseFileSystem.java:195)
at alluxio.client.file.BaseFileSystem.listStatus(BaseFileSystem.java:186)
at Main.main(Main.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
这不是
alluxio.security.authentication.type
问题,因为我在本地运行Alluxio,而Alluxio主地址正确。我不明白为什么它不能用于40.000文件,而可以用于20.000文件。我还修改了
alluxio.network.thrift.frame.size.bytes.max
,但未修改结果。 最佳答案
可能由于不同的原因导致此问题:
请仔细检查Alluxio主站地址端口是否正确。 Alluxio主服务器的默认侦听端口是端口19998,而导致此错误消息的常见错误是由于在主地址中使用了错误的端口(例如,使用端口19999,这是Alluxio主服务器的默认Web UI端口)。
请确保Alluxio客户端和主服务器的安全设置一致。通过配置alluxio.security.authentication.type,Alluxio提供了不同的方法来验证用户身份。如果此属性在服务器和客户端之间配置了不同的值,则会发生此错误(例如,一个使用默认值NOSASL,而另一个使用自定义为SIMPLE)。请阅读配置设置以了解如何自定义Alluxio集群和应用程序。
Apache-Spark和Alluxio之间的配置。您必须更改Spark的JVM环境,以在alluxio / conf / alluxio-site.properties中运行alluxio.network.thrift.frame.size.bytes.max。为此,必须在spark-env.sh中添加export SPARK_CLASSPATH = $ {ALLUXIO_HOME} / conf:$ {SPARK_CLASSPATH}或使用spark-submit命令添加--driver-class-path pathAlluxio / conf
对我来说,这是第三个解决方案
关于java - Alluxio的frame size()大于Spark上的max(),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/39041687/