本文介绍了使用Java的网页数据抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我现在试图用Java实现一个简单的HTML网页刮板。现在我遇到了一个小问题。
假设我有以下HTML片段。
< div id =sr-h-leftclass = SR-COMP >
< span style =cursor:pointer;类= SR-H-O >比较和LT; /跨度>
< / a>
< / div>
< div id =sr-h-rightclass =sr-summary>
< div id =sr-num-results>
< div class =sr-hor>显示1,439个匹配中的1-30个,
Pattern p = Pattern.compile(Showing [0-9,] + - [0-9,] +([0-9,] +)匹配);
Matcher m = p.matches(scrapedHTML);
if(m.matches()){
int num = Integer.parseInt(m.group(1).replaceAll(,,));
// num == 1439
}
<div id="sr-h-left" class="sr-comp">
<a class="link-gray-underline" id="compare_header" rel="nofollow" href="javascript:i18nCompareProd('/serv/main/buyer/ProductCompare.jsp?nxtg=41980a1c051f-0942A6ADCF43B802');">
<span style="cursor: pointer;" class="sr-h-o">Compare</span>
</a>
</div>
<div id="sr-h-right" class="sr-summary">
<div id="sr-num-results">
<div class="sr-h-o-r">Showing 1 - 30 of 1,439 matches,
Regular expressions are probably the best way to do it. Something like:
Pattern p = Pattern.compile("Showing [0-9,]+ - [0-9,]+ of ([0-9,]+) matches");
Matcher m = p.matches(scrapedHTML);
if(m.matches()) {
int num = Integer.parseInt(m.group(1).replaceAll(",", ""));
// num == 1439
}
这篇关于使用Java的网页数据抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!