我有这样的HTML:

    <table  cellspacing='0' border='0' width='100%'>
    <col align='left' />
    <tr>
    <td align='left'><font color='#FF0000'>Programming</font></td>
    </tr>
    </table>
    <table  cellspacing='0' border='0' width='100%'>
    <col align='left' />
    <col align='right' />
    <tr>
    <td align='left'><font color='#000000'>A1000</font></td>
    <td align='right'><font color='#008000'>D.Rogers</font></td>
    </tr>
    </table>


它是本地存储的。我试图弄清楚如何对“ Programming”,“ A1000”和“ D.Rogers”进行数据抓取。如何使用Java和Jsoup做到这一点?

最佳答案

根据帖子中的示例:

String localHtml=" <table cellspacing=\'0\' border=\'0\' width=\'100%\'>\n"+
        " <col align=\'left\' />\n"+
        " <tr>\n"+
        " <td align=\'left\'><font color=\'#FF0000\'>Programming</font></td>\n"+
        " </tr>\n"+
        " </table>\n"+
        " <table cellspacing=\'0\' border=\'0\' width=\'100%\'>\n"+
        " <col align=\'left\' />\n"+
        " <col align=\'right\' />\n"+
        " <tr>\n"+
        " <td align=\'left\'><font color=\'#000000\'>A1000</font></td>\n"+
        " <td align=\'right\'><font color=\'#008000\'>D.Rogers</font></td>\n"+
        " </tr>\n"+
        " </table>";

Document doc = Jsoup.parse(localHtml);

System.out.println(doc.select("font[color=#FF0000]").text());
System.out.println(doc.select("font[color=#000000]").text());
System.out.println(doc.select("font[color=#008000]").text());


输出值

Programming
A1000
D.Rogers

关于java - 数据抓取本地存储的HTML文件,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/36303495/

10-10 05:53