本文介绍了Nutch的2.1的URL注入需要永远的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过遵循部署在Ubuntu 12.04的Nutch 2.1 。一切顺利
直到我尝试注入的网址到数据库中。当我输入($斌/ Nutch的注入的网址)和preSS
进入我得到

I'm trying to deploy nutch 2.1 on Ubuntu 12.04 by following that tutorial. Everything goes welluntil I try to inject urls into the database. When I type ($bin/nutch inject urls) and pressEnter I get

    InjectorJob: starting
    InjectorJob: urlDir: urls

和仍然存在(几个小时),直到我决定取消执行。网址是一个目录
包含有网址的文件。我加在Nutch的-site.xml的代理和端口信息的建议<一href=\"http://stackoverflow.com/questions/22586950/nutch-2-2-1-doesnt-continue-after-injector-job?answertab=active#tab-top/\">here但它并没有解决。我试过的Apache Nutch的2.2.1和问题仍然存在。

and remains there (for hours) until I decide to cancel the execution. urls is a directorythat contains file with urls. I added proxy and port details in the nutch-site.xml as suggested here but it doesn't solve. I tried apache nutch 2.2.1 and the issue continues.

如果你知道如何解决这个问题,请帮帮我!

If you know how to fix that issue, please, help me!

在此先感谢。

推荐答案

Ubuntu的默认回环IP的主机127.0.1.1地址。 HBase的(根据)要求您的环回地址127.0.0.1是

Ubuntu defaults the loopback IP address in hosts to 127.0.1.1. HBase (according to this page) requires your loopback IP address be 127.0.0.1.

Ubuntu的 / etc / hosts中默认文件包含(与myComputerName是你的计算机名):

The Ubuntu /etc/hosts file by default contains (with myComputerName being your computer name):

127.0.0.1   localhost
127.0.1.1   myComputerName

使用须藤gedit的/ etc / hosts中更新您的hosts文件如下:

Use sudo gedit /etc/hosts to update your hosts file as follow:

127.0.0.1   localhost
127.0.0.1   myComputerName

重新启动Ubuntu Linux系统。 Nutch的应该不再有注射进入的网址HBase的麻烦。

Reboot Ubuntu. Nutch should no longer have trouble injecting urls into HBase.

这篇关于Nutch的2.1的URL注入需要永远的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 22:04