问题描述
我的硬盘上有多个HTML文件可与Jsoup解析。
我已经能够解析一个文件,但不能解析多个文件。
我想分析一个文件夹中的所有文件。
我编写了这个代码,它从html文件中提取文本(在某些id内)文件夹C:/ html中的file.htm):
package jsouptest;
import java.io.File;
import java.io.IOException;
导入org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
public static void main(String [] args){
Document doc;
尝试{
文件输入=新文件(C:/html/file.htm);
doc = Jsoup.parse(输入,UTF-8,);
元素ids = doc.select(div [id ^ = desk] p);
(Element id:ids){
System.out.println(\\\
+ id.text());
$ b catch(IOException e){
}
}
}
如何将此代码应用到文件夹C:/ html中的所有文件?
Thanks
提取代码以解析方法中的html;列出你的目录的内容并为每个文件调用parse
File input = new File(C:/ html);
File [] st = input.listFiles();
for(int i = 0; i if(st [i] .isFile()){//其他条件如姓名以html格式结尾
解析(ST [I]);
$ b $ p
$ b 所以你的代码应该如下所示:
import java.io.File;
import java.io.IOException;
导入org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
public static void main(String [] args){
File input = new File(C:/ html);
File [] st = input.listFiles();
for(int i = 0; i if(st [i] .isFile()){//其他条件如姓名以html格式结尾
解析(ST [I]);
$ b private static void parse(File input){
Document doc;
try {
doc = Jsoup.parse(input,UTF-8,);
元素ids = doc.select(div [id ^ = desk] p);
(Element id:ids){
System.out.println(\\\
+ id.text());
$ b} catch(IOException e){
}
}
I've got multiple HTML files on my hdd to parse with Jsoup.I've been able to parse one file but not multiple files.I would like to parse all the files of a folder.
I wrote this code wich extracts text (within certain ids) from a html file (named "file.htm" in the folder "C:/html") :
package jsouptest;
import java.io.File;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
public static void main(String[] args) {
Document doc;
try{
File input = new File("C:/html/file.htm");
doc = Jsoup.parse(input, "UTF-8", "");
Elements ids = doc.select("div[id^=desk] p");
for (Element id : ids){
System.out.println("\n"+id.text());
}
}catch(IOException e){
}
}
}
How to apply this code to all files that are in the folder "C:/html" ?Thanks
解决方案 Extract the code to parse html in a method; list the content of your directory and call parse for each file
File input = new File("C:/html");
File[] st = input.listFiles();
for (int i = 0; i < st.length; i++) {
if(st[i].isFile()){//other condition like name ends in html
parse(st[i]);
}
}
so your code should look like this:
import java.io.File;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
public static void main(String[] args) {
File input = new File("C:/html");
File[] st = input.listFiles();
for (int i = 0; i < st.length; i++) {
if(st[i].isFile()){//other condition like name ends in html
parse(st[i]);
}
}
}
private static void parse(File input ) {
Document doc;
try{
doc = Jsoup.parse(input, "UTF-8", "");
Elements ids = doc.select("div[id^=desk] p");
for (Element id : ids){
System.out.println("\n"+id.text());
}
}catch(IOException e){
}
}
}
这篇关于Jsoup:如何从本地驱动器解析多个HTML文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!