本文介绍了Jsoup:如何从本地驱动器解析多个HTML文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的硬盘上有多个HTML文件可与Jsoup解析。
我已经能够解析一个文件,但不能解析多个文件。
我想分析一个文件夹中的所有文件。



我编写了这个代码,它从html文件中提取文本(在某些id内)文件夹C:/ html中的file.htm):

  package jsouptest; 

import java.io.File;
import java.io.IOException;

导入org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Main {

public static void main(String [] args){
Document doc;

尝试{

文件输入=新文件(C:/html/file.htm);

doc = Jsoup.parse(输入,UTF-8,);


元素ids = doc.select(div [id ^ = desk] p);

(Element id:ids){

System.out.println(\\\
+ id.text());


$ b catch(IOException e){

}

}

}

如何将此代码应用到文件夹C:/ html中的所有文件?
Thanks

解决方案

提取代码以解析方法中的html;列出你的目录的内容并为每个文件调用parse

  File input = new File(C:/ html); 
File [] st = input.listFiles();
for(int i = 0; i if(st [i] .isFile()){//其他条件如姓名以html格式结尾
解析(ST [I]);




$ b $ p
$ b

所以你的代码应该如下所示:

  import java.io.File; 
import java.io.IOException;

导入org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Main {

public static void main(String [] args){
File input = new File(C:/ html);
File [] st = input.listFiles();
for(int i = 0; i if(st [i] .isFile()){//其他条件如姓名以html格式结尾
解析(ST [I]);




$ b private static void parse(File input){
Document doc;

try {

doc = Jsoup.parse(input,UTF-8,);


元素ids = doc.select(div [id ^ = desk] p);

(Element id:ids){

System.out.println(\\\
+ id.text());


$ b} catch(IOException e){


}
}


I've got multiple HTML files on my hdd to parse with Jsoup.I've been able to parse one file but not multiple files.I would like to parse all the files of a folder.

I wrote this code wich extracts text (within certain ids) from a html file (named "file.htm" in the folder "C:/html") :

package jsouptest;

import java.io.File;
import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Main {

    public static void main(String[] args) {
        Document doc;

        try{

            File input = new File("C:/html/file.htm");

            doc = Jsoup.parse(input, "UTF-8", "");


            Elements ids = doc.select("div[id^=desk] p");

            for (Element id : ids){

                System.out.println("\n"+id.text());

            }

        }catch(IOException e){

        }

    }

}

How to apply this code to all files that are in the folder "C:/html" ?Thanks

解决方案

Extract the code to parse html in a method; list the content of your directory and call parse for each file

   File input = new File("C:/html");
   File[] st = input.listFiles();
   for (int i = 0; i < st.length; i++) {
          if(st[i].isFile()){//other condition like name ends in html
                 parse(st[i]);
          }
   }

so your code should look like this:

import java.io.File;
import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Main {

    public static void main(String[] args) {
        File input = new File("C:/html");
        File[] st = input.listFiles();
        for (int i = 0; i < st.length; i++) {
            if(st[i].isFile()){//other condition like name ends in html
                parse(st[i]);
            }
        }

    }

    private static void parse(File input ) {
        Document doc;

        try{

            doc = Jsoup.parse(input, "UTF-8", "");


            Elements ids = doc.select("div[id^=desk] p");

            for (Element id : ids){

                System.out.println("\n"+id.text());

            }

        }catch(IOException e){

        }
    }
}

这篇关于Jsoup:如何从本地驱动器解析多个HTML文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 12:48
查看更多