Jsoup：如何从本地驱动器解析多个HTML文件？

本文介绍了Jsoup：如何从本地驱动器解析多个HTML文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的硬盘上有多个HTML文件可与Jsoup解析。
我已经能够解析一个文件，但不能解析多个文件。
我想分析一个文件夹中的所有文件。

我编写了这个代码，它从html文件中提取文本（在某些id内）文件夹C：/ html中的file.htm）：

  package jsouptest; 
 
 import java.io.File; 
 import java.io.IOException; 
 
导入org.jsoup.Jsoup; 
 import org.jsoup.nodes.Document; 
 import org.jsoup.nodes.Element; 
 import org.jsoup.select.Elements; 
 
 public class Main {
 
 public static void main（String [] args）{
 Document doc; 
 
尝试{
 
文件输入=新文件（C：/html/file.htm）; 
 
 doc = Jsoup.parse（输入，UTF-8，）; 
 
 
元素ids = doc.select（div [id ^ = desk] p）; 
 
（Element id：ids）{
 
 System.out.println（\\\
+ id.text（））; 
 
 
 $ b catch（IOException e）{
 
} 
 
} 
 
 }

如何将此代码应用到文件夹C：/ html中的所有文件？
Thanks

解决方案

提取代码以解析方法中的html;列出你的目录的内容并为每个文件调用parse

  File input = new File（C：/ html）; 
 File [] st = input.listFiles（）; 
 for（int i = 0; i  if（st [i] .isFile（））{//其他条件如姓名以html格式结尾
解析（ST [I]）; 
 
 
 
 
 $ b $ p 
 $ b 所以你的代码应该如下所示：
  import java.io.File; 
 import java.io.IOException; 
 
导入org.jsoup.Jsoup; 
 import org.jsoup.nodes.Document; 
 import org.jsoup.nodes.Element; 
 import org.jsoup.select.Elements; 
 
 public class Main {
 
 public static void main（String [] args）{
 File input = new File（C：/ html）; 
 File [] st = input.listFiles（）; 
 for（int i = 0; i  if（st [i] .isFile（））{//其他条件如姓名以html格式结尾
解析（ST [I]）; 
 
 
 
 
 $ b private static void parse（File input）{
 Document doc; 
 
 try {
 
 doc = Jsoup.parse（input，UTF-8，）; 
 
 
元素ids = doc.select（div [id ^ = desk] p）; 
 
（Element id：ids）{
 
 System.out.println（\\\
+ id.text（））; 
 
 
 $ b} catch（IOException e）{
 
 
} 
} 
  
 
I've got multiple HTML files on my hdd to parse with Jsoup.I've been able to parse one file but not multiple files.I would like to parse all the files of a folder.
I wrote this code wich extracts text (within certain ids) from a html file (named "file.htm" in the folder "C:/html") : 
package jsouptest;

import java.io.File;
import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Main {

    public static void main(String[] args) {
        Document doc;

        try{

            File input = new File("C:/html/file.htm");

            doc = Jsoup.parse(input, "UTF-8", "");


            Elements ids = doc.select("div[id^=desk] p");

            for (Element id : ids){

                System.out.println("\n"+id.text());

            }

        }catch(IOException e){

        }

    }

}
How to apply this code to all files that are in the folder "C:/html" ?Thanks
 解决方案 
Extract the code to parse html in a method; list the content of your directory and call parse for each file
   File input = new File("C:/html");
   File[] st = input.listFiles();
   for (int i = 0; i < st.length; i++) {
          if(st[i].isFile()){//other condition like name ends in html
                 parse(st[i]);
          }
   }
so your code should look like this:
import java.io.File;
import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Main {

    public static void main(String[] args) {
        File input = new File("C:/html");
        File[] st = input.listFiles();
        for (int i = 0; i < st.length; i++) {
            if(st[i].isFile()){//other condition like name ends in html
                parse(st[i]);
            }
        }

    }

    private static void parse(File input ) {
        Document doc;

        try{

            doc = Jsoup.parse(input, "UTF-8", "");


            Elements ids = doc.select("div[id^=desk] p");

            for (Element id : ids){

                System.out.println("\n"+id.text());

            }

        }catch(IOException e){

        }
    }
}
                        
这篇关于Jsoup：如何从本地驱动器解析多个HTML文件？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！