问题描述
我想从脚本和 jsoup 从html页面获取字符串值.但是有一些问题:
I want to get a string value from a script with jsoup from a html page. But there are some problems:
- 该页面中有六个scips.我想用jsoup选择全部(我的意思是数字4).而且我不知道该怎么办.
- 该脚本中有一个键,我想捕获该键的值
- there are six scipts in that page. and i want to select forth of all with jsoup(I mean number 4). and I don't know how I can do it.
- there is a key in that script and i want to catch value of that key
在这里您可以看到想要的脚本:
here you can see wanted script:
<script type="text/javascript">window._sharedData={
"entry_data": {
"PostPage": [
{
"media": {
"key": "This is the key and i wanna catch it!!!",
},
}
]
},
};</script>
我尝试了很多方法,但是没有成功.
I have tried many ways, but I wasn't successful.
我在寻找forwrd来获得答案,所以请不要让我失望!
I'm looking forwrd to get the answer, so pls don't let me down!
推荐答案
JSoup仅会帮助您以字符串形式获取脚本标签的内容.它解析HTML,而不是JavaScript脚本内容.由于在您的情况下,脚本的内容是JSON表示法中的简单对象,因此您可以在获取脚本字符串并剥离变量分配之后使用JSON解析器.在下面的代码中,我使用了 JSON简单解析器.
JSoup will only help you to get the contents of the script tag as a string. It parses HTML, not script content which is JavaScript. Since in your case the contents of the script is a simple object in JSON notation you could employ a JSON parser after you get the script string and stripping off the variable assignment. IN the below code I use the JSON simple parser.
String html = "<script></script><script></script><script></script>"
+"<script type=\"text/javascript\">window._sharedData={"
+" \"entry_data\": {"
+" \"PostPage\": ["
+" {"
+" \"media\": {"
+" \"key\": \"This is the key and i wanna catch it!!!\","
+" },"
+" }"
+" ]"
+" },"
+"};</script><script></script>";
Document doc = Jsoup.parse(html);
//get the 4th script
Element scriptEl = doc.select("script").get(3);
String scriptContentStr = scriptEl.html();
//clean to get json
String jsonStr = scriptContentStr
.replaceFirst("^.*=\\{", "{") //clean beginning
.replaceFirst("\\;$", ""); //clean end
JSONObject jo = (JSONObject) JSONValue.parse(jsonStr);
JSONArray postPageJA = ((JSONArray)((JSONObject)jo.get("entry_data")).get("PostPage"));
JSONObject mediaJO = (JSONObject) postPageJA.get(0);
JSONObject keyJO = (JSONObject) mediaJO.get("media");
String keyStr = (String) keyJO.get("key");
System.out.println("keyStr = "+keyStr);
这有点复杂,并且还取决于您对JSON结构的了解.一个更简单的方法可能是使用正则表达式:
This is a bit complicated, and also it depends on your knowledge about the structure of the JSON. A much simpler way may be to use regular expressions:
Pattern p = Pattern.compile(
"media[\":\\s\\{]+key[\":\\s\\{]+\"([^\"]+)\"",
Pattern.DOTALL);
Matcher m = p.matcher(html);
if (m.find()){
String keyFromRE = m.group(1);
System.out.println("keyStr (via RegEx) = "+keyFromRE);
}
这篇关于如何在Android编程中使用jsoup从html获取此脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!