我正在尝试向页面发送HTTP GET请求,该页面将响应正文发回给我,然后我希望对其进行解析,以从div标记之一中提取特定值。例如,假设感兴趣的div标签如下所示:

<div id="nameofPlayer">Star Crafter</div>


我只对该div标签的封闭值感兴趣,在本例中为“ Star Crafter”。
我对此并不陌生,已经遇到了几种方法和实现方法来实现,但是很困惑,需要一种简单,有效的方法。我当前使用的代码如下:

import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import javax.net.ssl.HttpsURLConnection;

public class SB_HRW_Tracker {

    private final String USER_AGENT = "Mozilla/5.0";

    public static void main(String[] args) throws Exception {

        SB_HRW_Tracker http = new SB_HRW_Tracker();

        System.out.println("Testing 1 - Send Http GET request");
        http.sendGet();
    }

    // HTTP GET request
    private void sendGet() throws Exception {

        String url = "www.somedummyurl.com";


        URL obj = new URL(url);
        HttpURLConnection con = (HttpURLConnection) obj.openConnection();

        // optional default is GET
        con.setRequestMethod("GET");

        //add request header
        con.setRequestProperty("User-Agent", USER_AGENT);

        int responseCode = con.getResponseCode();
        System.out.println("\nSending 'GET' request to URL : " + url);
        System.out.println("Response Code : " + responseCode);

        /* Possible convert the responseCode to JSON here for ease of parsking? */

        BufferedReader in = new BufferedReader(
                new InputStreamReader(con.getInputStream()));
        String inputLine;
        StringBuffer response = new StringBuffer();

        while ((inputLine = in.readLine()) != null) {
            response.append(inputLine);
        }
        in.close();

        //print result
        System.out.println(response.toString());

    }
}


我不确定如何在此处使用Regex来解析响应内容,以获取特定div标记(regex和某些子字符串函数的组合)之间包含的值。另外,我不确定将响应转换为JSON以简化解析是否会更好。任何有关如何轻松有效地实现此目的的指针将受到高度赞赏。谢谢!

最佳答案

我想您可以只使用正则表达式:

    String html = "<html><head><body><div id=\"nameofPlayer\">Star Crafter</div></body></html>";

    // strip out your required data with a regex
    Pattern pattern = Pattern.compile(".*<div id=\"nameofPlayer\">(.*?)</div>.*");
    Matcher matcher = pattern.matcher(html);

    if (matcher.find()) {
        System.out.println(matcher.group(1));
    }


结果:

Star Crafter


通读:http://tutorials.jenkov.com/java-regex/matcher.html

09-25 11:33