我正在尝试向页面发送HTTP GET请求,该页面将响应正文发回给我,然后我希望对其进行解析,以从div标记之一中提取特定值。例如,假设感兴趣的div标签如下所示:
<div id="nameofPlayer">Star Crafter</div>
我只对该div标签的封闭值感兴趣,在本例中为“ Star Crafter”。
我对此并不陌生,已经遇到了几种方法和实现方法来实现,但是很困惑,需要一种简单,有效的方法。我当前使用的代码如下:
import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import javax.net.ssl.HttpsURLConnection;
public class SB_HRW_Tracker {
private final String USER_AGENT = "Mozilla/5.0";
public static void main(String[] args) throws Exception {
SB_HRW_Tracker http = new SB_HRW_Tracker();
System.out.println("Testing 1 - Send Http GET request");
http.sendGet();
}
// HTTP GET request
private void sendGet() throws Exception {
String url = "www.somedummyurl.com";
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
// optional default is GET
con.setRequestMethod("GET");
//add request header
con.setRequestProperty("User-Agent", USER_AGENT);
int responseCode = con.getResponseCode();
System.out.println("\nSending 'GET' request to URL : " + url);
System.out.println("Response Code : " + responseCode);
/* Possible convert the responseCode to JSON here for ease of parsking? */
BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
//print result
System.out.println(response.toString());
}
}
我不确定如何在此处使用Regex来解析响应内容,以获取特定div标记(regex和某些子字符串函数的组合)之间包含的值。另外,我不确定将响应转换为JSON以简化解析是否会更好。任何有关如何轻松有效地实现此目的的指针将受到高度赞赏。谢谢!
最佳答案
我想您可以只使用正则表达式:
String html = "<html><head><body><div id=\"nameofPlayer\">Star Crafter</div></body></html>";
// strip out your required data with a regex
Pattern pattern = Pattern.compile(".*<div id=\"nameofPlayer\">(.*?)</div>.*");
Matcher matcher = pattern.matcher(html);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
结果:
Star Crafter
通读:http://tutorials.jenkov.com/java-regex/matcher.html