如何在Android编程中使用jsoup从html获取此脚本

如何在Android编程中使用jsoup从html获取此脚本

本文介绍了如何在Android编程中使用jsoup从html获取此脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从脚本 jsoup 从html页面获取字符串值.但是有一些问题:

I want to get a string value from a script with jsoup from a html page. But there are some problems:

  1. 该页面中有六个scips.我想用jsoup选择全部(我的意思是数字4).而且我不知道该怎么办.
  2. 该脚本中有一个键,我想捕获该键的值
  1. there are six scipts in that page. and i want to select forth of all with jsoup(I mean number 4). and I don't know how I can do it.
  2. there is a key in that script and i want to catch value of that key

在这里您可以看到想要的脚本:

here you can see wanted script:

<script type="text/javascript">window._sharedData={

  "entry_data": {
    "PostPage": [
      {
        "media": {

          "key": "This is the key and i wanna catch it!!!",

        },
      }
    ]
  },

};</script>

我尝试了很多方法,但是没有成功.

I have tried many ways, but I wasn't successful.

我在寻找forwrd来获得答案,所以请不要让我失望!

I'm looking forwrd to get the answer, so pls don't let me down!

推荐答案

JSoup仅会帮助您以字符串形式获取脚本标签的内容.它解析HTML,而不是JavaScript脚本内容.由于在您的情况下,脚本的内容是JSON表示法中的简单对象,因此您可以在获取脚本字符串并剥离变量分配之后使用JSON解析器.在下面的代码中,我使用了 JSON简单解析器.

JSoup will only help you to get the contents of the script tag as a string. It parses HTML, not script content which is JavaScript. Since in your case the contents of the script is a simple object in JSON notation you could employ a JSON parser after you get the script string and stripping off the variable assignment. IN the below code I use the JSON simple parser.

String html = "<script></script><script></script><script></script>"
    +"<script type=\"text/javascript\">window._sharedData={"
    +"  \"entry_data\": {"
    +"    \"PostPage\": ["
    +"      {"
    +"        \"media\": {"
    +"          \"key\": \"This is the key and i wanna catch it!!!\","
    +"        },"
    +"      }"
    +"    ]"
    +"  },"
    +"};</script><script></script>";
Document doc = Jsoup.parse(html);
//get the 4th script
Element scriptEl = doc.select("script").get(3);
String scriptContentStr = scriptEl.html();
//clean to get json
String jsonStr = scriptContentStr
     .replaceFirst("^.*=\\{", "{") //clean beginning
     .replaceFirst("\\;$", ""); //clean end
JSONObject jo = (JSONObject) JSONValue.parse(jsonStr);
JSONArray postPageJA = ((JSONArray)((JSONObject)jo.get("entry_data")).get("PostPage"));
JSONObject mediaJO = (JSONObject) postPageJA.get(0);
JSONObject keyJO = (JSONObject) mediaJO.get("media");
String keyStr = (String) keyJO.get("key");

System.out.println("keyStr = "+keyStr);

这有点复杂,并且还取决于您对JSON结构的了解.一个更简单的方法可能是使用正则表达式:

This is a bit complicated, and also it depends on your knowledge about the structure of the JSON. A much simpler way may be to use regular expressions:

Pattern p = Pattern.compile(
    "media[\":\\s\\{]+key[\":\\s\\{]+\"([^\"]+)\"",
    Pattern.DOTALL);
Matcher m = p.matcher(html);
if (m.find()){
    String keyFromRE = m.group(1);
    System.out.println("keyStr (via RegEx) = "+keyFromRE);
}

这篇关于如何在Android编程中使用jsoup从html获取此脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 12:25