我正在尝试使crawler4j的基本形式运行,如here所示。我通过定义rootFolder和numberOfCrawlers如下修改了前几行:

public class BasicCrawlController {

    public static void main(String[] args) throws Exception {
            if (args.length != 2) {
                    System.out.println("Needed parameters: ");
                    System.out.println("\t rootFolder (it will contain intermediate crawl data)");
                    System.out.println("\t numberOfCralwers (number of concurrent threads)");
                    return;
            }

            /*
             * crawlStorageFolder is a folder where intermediate crawl data is
             * stored.
             */
             String crawlStorageFolder = args[0];

              args[0] = "/data/crawl/root";

            /*
             * numberOfCrawlers shows the number of concurrent threads that should
             * be initiated for crawling.
             */
            int numberOfCrawlers = Integer.parseInt(args[1]);

            args[1] = "7";


            CrawlConfig config = new CrawlConfig();

            config.setCrawlStorageFolder(crawlStorageFolder);


无论我如何定义它,我仍然会收到错误

Needed parameters:
 rootFolder (it will contain intermediate crawl data)
 numberOfCralwers (number of concurrent threads)


我认为我需要“在“运行配置”中设置参数”窗口,但我不知道这意味着什么。如何正确配置此基本搜寻器以使其启动并运行?

最佳答案

使用javac关键字编译程序后,需要通过键入以下命令来运行它:

java BasicCrawler控制器“ arg1”“ arg2”

该错误告诉您在运行程序时未指定arg [0]或arg [1]。另外,这个“ args [1] =” 7“;”是什么?在您已经收到了多少爬虫参数之后?

对于看起来像您尝试删除前5行的内容,因为无论如何您都尝试使用硬编码值。然后将crawlForStorage String设置为目录路径,将numberOfCrawlers设置为7。这样就不必指定命令行参数。如果要使用命令行参数,请删除上面的硬编码值,并在CL处指定它们

关于java - crawler4j的实现,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/10001233/

10-10 22:58