本文介绍了jsoup-阻止jsoup对& amp;的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我解析本地HTML文件时,jsoup会将锚元素内的引号更改为&使我的HTML变得晦涩难懂.

When I parse local HTML files jsoup changes quotes inside an anchor element to & obscuring my HTML.

假设我想在以下HTML部分中将值一"更改为二":

let's assume i want to change the value "one" to "two" in the following HTML part:

<div class="pg2-txt1">
  <a class="foo" appareantly_a_javascript_statement='{"targetId":"pg1-magn1", "ordinal":1}'>one</a>
</div>

我得到的是:

<div class="pg2-txt1">
  <a class="foo" appareantly_a_javascript_statement="{&quot;targetId&quot;:&quot;pg1-magn1&quot;, &quot;ordinal&quot;:1}">two</a>
</div>

anchor元素内的引号是必需的.我的代码现在看起来像这样:

The quotes inside the anchor element are needed. My code looks like this now:

File input = new File("D:/javatest/page02.html");
Document doc = Jsoup.parse(input, "UTF-8");
Element div = doc.select("div.pg2-txt1").first(); //anchor element only identifyable by parent <div> class
div.child(0).text("one"); //actual anchor element

我尝试了

doc.outputSettings().prettyPrint(false);

没有成功.

我可以用jsoup实现吗?我是否必须使用其他解析器以及它的外观如何?

Can I achieve this with jsoup? Do I have to use a different parser and how would that look like.

非常感谢您.

推荐答案

根据 html规范 JSoup表现得很好:

According to the html spec JSoup behaves totally fine:

注意最后一句话!

基本上,这意味着您的其他需要appareantly_a_javascript_statement属性中双引号的软件正在对其值进行一些不完整的解析.

Basically that means, that your other software that needs the double quotes in the appareantly_a_javascript_statement attribute is doing some incomplete parsing of its value.

我看到两种解决方案:

1)修改解释appareantly_a_javascript_statement值的函数

由于我不知道该怎么做,因此我在这里无法为您提供帮助.

I can't help you there, since I have no knowledge of where it is done.

2)通过正则表达式更改Jsoup输出.

这很hacky ...

This is pretty hacky...

String html = doc.outerHtml();
boolean changed = false;
html = html.replaceAll("(=\"\\{)([^\"]+)(\")", "='{$2'");
do{
    int oldLength = html.length();
    html = html.replaceAll("(=')([^']+)(\\&quot;)([^\']+)(')", "$1$2\"$4$5");
    changed = html.length() != oldLength;
}while(changed);
System.out.print(html);

这篇关于jsoup-阻止jsoup对&amp; amp;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 12:09