本文介绍了什么是好的 Web Crawler 工具的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!


我需要索引很多网页,有哪些好的网络爬虫工具?我更喜欢 .NET 可以与之对话的东西,但这不是一个亮点.

I need to index a whole lot of webpages, what good webcrawler utilities are there? I'm preferably after something that .NET can talk to, but that's not a showstopper.

我真正需要的是我可以将站点 url 提供给 & 的东西.它将跟随每个链接并存储索引内容.

What I really need is something that I can give a site url to & it will follow every link and store the content for indexing.


HTTrack -- http://www.httrack.com/ -- 是一个非常好的网站复制器.效果很好.用了很久了.

HTTrack -- http://www.httrack.com/ -- is a very good Website copier. Works pretty good. Have been using it for a long time.

Nutch 是一个网络爬虫(爬虫是你正在寻找的程序类型) -- http://lucene.apache.org/nutch/ -- 它使用一流的搜索工具 lucene.

Nutch is a web crawler(crawler is the type of program you're looking for) -- http://lucene.apache.org/nutch/ -- which uses a top notch search utility lucene.

这篇关于什么是好的 Web Crawler 工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 10:39