问题描述
我终于让IntelliJ工作了.我正在使用下面的代码.完美的作品.我需要它一遍又一遍地循环,并从电子表格中提取链接,以便一遍又一遍地查找不同项目上的价格.我有一个电子表格,该电子表格的C列中的一些示例URL从第2行开始.如何让JSOUP使用此电子表格中的URL,然后将其输出到D列?
I finally got IntelliJ to work. I'm using the code below. It works perfect. I need it to loop over and over and pull links from a spreadsheet to find the price over and over again on different items. I have a spreadsheet with a few sample URLs located in column C starting at row 2. How can I have JSOUP use the URLs in this spreadsheet then output to column D?
public class Scraper {
public static void main(String[] args) throws Exception {
final Document document = Jsoup.connect("examplesite.com").get();
for (Element row : document.select("#price")) {
final String price = row.select("#price").text();
System.out.println(price);
}
}
在此先感谢您的帮助!埃里克
Thanks in advance for any help!Eric
推荐答案
您可以使用JExcel库读取和编辑工作表: https://sourceforge.net/projects/jexcelapi/.
You can use JExcel library to read and edit sheets: https://sourceforge.net/projects/jexcelapi/ .
当您下载带有库的zip文件时,还有一个非常有用的tutorial.html
.
When you download the zip file with library there's also very useful tutorial.html
.
注释说明:
import java.io.File;
import java.io.IOException;
import jxl.Cell;
import jxl.CellType;
import jxl.Workbook;
import jxl.write.Label;
import jxl.write.WritableSheet;
import jxl.write.WritableWorkbook;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class StackoverflowQuestion51577491 {
private static final int URL_COLUMN = 2; // Column C
private static final int PRICE_COLUMN = 3; // Column D
public static void main(final String[] args) throws Exception {
// open worksheet with URLs
Workbook originalWorkbook = Workbook.getWorkbook(new File("O:/original.xls"));
// create editable copy
WritableWorkbook workbook = Workbook.createWorkbook(new File("O:/updated.xls"), originalWorkbook);
// close read-only workbook as it's not needed anymore
originalWorkbook.close();
// get first available sheet
WritableSheet sheet = workbook.getSheet(0);
// skip title row 0
int currentRow = 1;
Cell cell;
// iterate each cell from column C until we find an empty one
while (!(cell = sheet.getCell(URL_COLUMN, currentRow)).getType().equals(CellType.EMPTY)) {
// raed cell contents
String url = cell.getContents();
System.out.println("parsing URL: " + url);
// parse and get the price
String price = parseUrlWithJsoupAndGetProductPrice(url);
System.out.println("found price: " + price);
// create new cell with price
Label cellWithPrice = new Label(PRICE_COLUMN, currentRow, price);
sheet.addCell(cellWithPrice);
// go to next row
currentRow++;
}
// save and close file
workbook.write();
workbook.close();
}
private static String parseUrlWithJsoupAndGetProductPrice(String url) throws IOException {
// download page and parse it to Document
Document doc = Jsoup.connect(url).get();
// get the price from html
return doc.select("#priceblock_ourprice").text();
}
}
之前:后:
这篇关于通过电子表格将JSOUP的URL导入到Scrape的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!