问题描述
我正在运行Apache Nutch 2.3.1开箱即用,它使用的是Gora 0.6.1。我按照这里的说明操作:
I'm running Apache Nutch 2.3.1 out of the box, which uses Gora 0.6.1. I've followed the instructions here: http://wiki.apache.org/nutch/RunNutchInEclipse
InjectorJob
运行良好。
现在我正在运行 FetcherJob
,而Gora使用 MemStore
作为数据存储。我有 gora.properties
包含
Now I'm running the FetcherJob
, and Gora uses MemStore
as a data store. I have gora.properties
containing
gora.datastore.default=org.apache.gora.memory.store.MemStore
此抛出:
2016-10-02 22:55:54,605 ERROR mapreduce.GoraRecordReader (GoraRecordReader.java:nextKeyValue(121)) - Error reading Gora records: null
2016-10-02 22:55:54,605 INFO mapred.MapTask (MapTask.java:flush(1460)) - Starting flush of map output
2016-10-02 22:55:54,614 INFO mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - map task executor complete.
2016-10-02 22:55:54,615 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local874667143_0001
java.lang.Exception: java.lang.RuntimeException: java.util.NoSuchElementException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.util.NoSuchElementException
at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:122)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.NoSuchElementException
at java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
at org.apache.gora.memory.store.MemStore.execute(MemStore.java:128)
at org.apache.gora.query.impl.QueryBase.execute(QueryBase.java:73)
at org.apache.gora.mapreduce.GoraRecordReader.executeQuery(GoraRecordReader.java:67)
at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:109)
... 12 more
2016-10-02 22:55:55,383 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job job_local874667143_0001 running in uber mode : false
2016-10-02 22:55:55,385 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 0% reduce 0%
2016-10-02 22:55:55,387 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Job job_local874667143_0001 failed with state FAILED due to: NA
2016-10-02 22:55:55,396 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1385)) - Counters: 0
Exception in thread "main" java.lang.RuntimeException: job failed: name=, jobid=job_local874667143_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:119)
at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:205)
at org.apache.nutch.fetcher.FetcherJob.fetch(FetcherJob.java:251)
at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:314)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.fetcher.FetcherJob.main(FetcherJob.java:321)
这种情况发生在Nutch和Gora的深处,我不知道为什么会发生这种情况。我尝试升级到Gora 0.8但同样的问题。我尝试将Gora降级到0.6,同样的问题。我想切换到另一个像hBase这样的数据存储,但这对我现在需要的东西来说有点过分。
This happens so deep into Nutch and Gora that I have no idea why it's happening. I tried upgrading to Gora 0.8 but same problem. I tried downgrading Gora to 0.6, same problem. I wanted to switch to another data store like hBase but that's a bit overkill for what I need at this moment.
请帮我解决这个问题。
推荐答案
我确认问题出在MemStore中。
I confirm the problem is in MemStore.
在0.6.1中有一个bug:
In 0.6.1 there is a bug:https://github.com/apache/gora/blob/apache-gora-0.6.1/gora-core/src/main/java/org/apache/gora/memory/store/MemStore.java#L128
这已在master中解决:,对#firstKey()的访问有一个警卫#isEmpty( )
That is already solved in master: https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/memory/store/MemStore.java#L155 , the access to #firstKey() has a guard #isEmpty()
如果你想用Nutch 2.x使用Gora-0.7-SNAPSHOT,也许你可以让它工作:
If you want to use Gora-0.7-SNAPSHOT with Nutch 2.x, maybe you could have it working doing this:
- 下载Gora的主分支使用版本0.7-SNAPSHOT
- 在gora中执行
mvn install
将其安装在maven的本地存储库中 - 将此补丁应用于Nutch:,以便Nutch 2.3 .1将与Gora合作0.7-SNAPSHOT
- Do Nutch的教程内容
- Download Gora's master branch with version 0.7-SNAPSHOT
- Do
mvn install
in gora/ to install it in maven's local repository - Apply this patch to Nutch: https://paste.apache.org/jjqz so Nutch 2.3.1 will work with Gora 0.7-SNAPSHOT
- Do Nutch's tutorial stuff
我希望它工程:))
关于使用HBase,可以很容易地进行本地安装进行实验。
About using HBase, it is quite easy to do a local installation for experimenting.
- 如,下载
- 在目录中膨胀tar.gz文件,例如:
/ home / you / hbase
-
cd / home / you / hbase / bin
-
./ start-hbase.sh
- As stated in Nutch2Tutorial, download HBase 0.98.8-hadoop2
- Inflate the tar.gz file in a directory, for example:
/home/you/hbase
cd /home/you/hbase/bin
./start-hbase.sh
现在你已经开始使用HBase了。
配置Nutch:
Now you have HBase up&running.Configure Nutch:
ivy / ivy.xml:
看看@ Emmanuel关于HBase常春藤依赖配置的评论。
ivy/ivy.xml: Look at @Emmanuel's comment about HBase's ivy dependence configuration.
gora.properties:
gora.properties:
gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
gora.datastore.autocreateschema=true
gora.datastore.scanner.caching=100
nutch-site.xml:
nutch-site.xml:
<configuration>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.hbase.store.HBaseStore</value>
<description>Default class for storing data</description>
</property>
</configuration>
完成。它将采用HBase的所有默认配置:localhost,/ tmp / ...,blablabla
Done. It will take all the default configurations for HBase: localhost, /tmp/..., blablabla
这篇关于Apache Nutch:FetcherJob在Gora中抛出NoSuchElementException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!