我的Hadoop作业在Amazon ElasticMapreduce AMI 3.7.0上运行良好。但是,当我升级到AMI版本3.8.0时,java.net.URL类的toString方法开始引发NullPointerException:

java.lang.NullPointerException
  at java.net.URL.toExternalForm(URL.java:925)
  at java.net.URL.toString(URL.java:911)
  at com.snowplowanalytics.iglu.client.repositories.HttpRepositoryRef.lookupSchema(HttpRepositoryRef.scala:602)
  at com.snowplowanalytics.iglu.client.Resolver.recurse$1(Resolver.scala:236)
  at com.snowplowanalytics.iglu.client.Resolver.lookupSchema(Resolver.scala:247)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$2$$anonfun$apply$6$$anonfun$apply$7.apply(validatableJson.scala:171)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$2$$anonfun$apply$6$$anonfun$apply$7.apply(validatableJson.scala:170)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$2$$anonfun$apply$6.apply(validatableJson.scala:170)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$2$$anonfun$apply$6.apply(validatableJson.scala:169)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$2.apply(validatableJson.scala:169)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$2.apply(validatableJson.scala:166)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$.verifySchemaAndValidate(validatableJson.scala:166)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonNode.verifySchemaAndValidate(validatableJson.scala:244)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$$anonfun$extractAndValidateJson$1$$anonfun$apply$8.apply(Shredder.scala:267)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$$anonfun$extractAndValidateJson$1$$anonfun$apply$8.apply(Shredder.scala:266)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$$anonfun$extractAndValidateJson$1.apply(Shredder.scala:266)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$$anonfun$extractAndValidateJson$1.apply(Shredder.scala:264)
  at scala.Option.map(Option.scala:145)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$.extractAndValidateJson(Shredder.scala:264)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$.extractContexts$1(Shredder.scala:101)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$.shred(Shredder.scala:108)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$$anonfun$loadAndShred$1.apply(ShredJob.scala:83)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$$anonfun$loadAndShred$1.apply(ShredJob.scala:80)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$.loadAndShred(ShredJob.scala:80)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$$anonfun$5.apply(ShredJob.scala:170)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$$anonfun$5.apply(ShredJob.scala:169)
  at com.twitter.scalding.MapFunction.operate(Operations.scala:58)
  at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
  at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
  at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
  at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
  at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:130)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:452)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)

调用该方法的URL不为null。该类的内部toExternalForm方法抛出该异常。

为什么会这样?

这是AMI 3.8.0(在主节点和核心节点上)的群集上的java -version的输出:
[hadoop@ip-xxx-xx-xx-xx ~]$ java -version
java version "1.7.0_76"
Java(TM) SE Runtime Environment (build 1.7.0_76-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)

对于AMI 3.7.0(在主节点和核心节点上):
[hadoop@ip-xxx-xx-xx-xx ~]$ java -version
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

难怪不同的JRE版本?

最佳答案

尽管我不愿提出这个要求,但这似乎是JVM错误。在java.net.URL的OpenJDK源代码中,toExternalForm()方法的整体是对处理程序的委托,该处理程序是一个临时字段:

public String toExternalForm() {
    return handler.toExternalForm(this);
}

引发NPE的唯一方法是handler为null。据我所知,所有构造函数路径和readObject(ObjectInputStream)方法可确保设置handler字段,并在无法设置的情况下抛出异常(MalformedURLExceptionIOException)。例如:
private synchronized void readObject(java.io.ObjectInputStream s)
     throws IOException, ClassNotFoundException
{
    s.defaultReadObject();  // read the fields
    if ((handler = getURLStreamHandler(protocol)) == null) {
        throw new IOException("unknown protocol: " + protocol);
    }
...

我注意到有一个公开的JRE 7u79发行版,如果升级到Java 8不可行,建议您尝试使用该版本。

10-01 05:26
查看更多