问题描述
我正在匹配元素文本中的特定字符串,并希望将匹配的文本包裹在一个范围内,以便能够选择它并稍后应用修改,但是html实体已被转义.有没有一种方法可以将带有html标签的字符串包装起来,以使其转义?
I am matching a specific string in an element text, and want to wrap the matching text with a span to be able to select it and apply modifications later on, but the html entities are being escaped. Is there a way to wrap the string with html tags with it being escaped ?
我尝试使用unescapeEntities()
方法,但是在这种情况下不起作用.wrap()
也不起作用.有关这些方法的参考,请检查 https://jsoup.org/apidocs/org/jsoup/parser/Parser.html
I tried using unescapeEntities()
, method but it doesn't work in this case.wrap()
didn't work as well.for reference to those methods check https://jsoup.org/apidocs/org/jsoup/parser/Parser.html
当前代码:
for (Element div : doc.select("div")) {
for (String input : listOfStrings) {
if (div.ownText().contains(input)) {
div.text(div.ownText().replaceFirst(input, "<span class=\"select-me\">" + input + "</span>"));
}
}
}
所需的输出
<div>some text <span class="select-me">matched string</span></div>
实际输出
<div>some text <span class="select-me">matched string</span></div>
推荐答案
根据您的问题和评论,您似乎只希望修改所选元素的直接文本节点,而无需修改所选文本的潜在内部元素的文本节点,因此
Based on your question and comments it looks like you only want to modify direct text-nodes of selected element without modifying text node of potential inner elements of selected text so in case of
<div>a b <span>b c</span></div>
如果要修改b
,我们只修改直接放置在<div>
中的一个,而不修改在<span>
中的一个.
if we want to modify b
we only modify one directly placed in <div>
but not one in <span>
.
<div>a b <span>b c</span></div>
^ ^----don't modify because it is in <span>, not *directly* in <div>
|
modify
不像<div>
<span>
等那样将文本视为ElementNode
,但是在DOM中将其表示为TextNode
,因此,如果我们具有<div> a <span>b</span> c </div>
这样的结构,则其DOM表示将是
Text is not considered as ElementNode
like <div>
<span>
etc, but in DOM it is represented as TextNode
so if we have structure like <div> a <span>b</span> c </div>
then its DOM representation would be
Element: <div>
├ Text: " a "
├ Element: <span>
│ └ Text: "b"
└ Text: " c "
如果我们想将部分文本包装到<span>
(或任何其他标签)中,我们将有效地分割单个TextNode
If we want to wrap portion of some text into <span>
(or any other tag) we are effectively splitting singe TextNode
├ Text: "foo bar baz"
分为以下系列:
├ Text: "foo "
├ Element: <span>
│ └ Text: "bar"
└ Text: " baz"
要创建使用该想法的解决方案, TextNode API为我们提供了一套非常有限的工具,但是在可用的方法中,我们可以使用
To create solution which uses that idea TextNode API gives us very limited set of tools, but among available methods we can use
-
splitText(index)
TextNode在其中保留拆分的左侧"并返回新的TextNode,该文本节点保留拆分的其余(右侧),就像TextNode node1
在TextNode node2 = node1.splitText(3);
node1
之后保存"foo bar"
时将保存"foo"
一样,而node2
将保持" bar"
并将被放置为node1
之后的直接同级 -
wrap(htmlElement)
(继承自Node
超类)将TextNode包装在表示htmlElement
的ElementNode中,例如node.wrap("<span class='myClass'>")
的结果,将得到<span class='myClass>text from node</span>
.
splitText(index)
which modifies original TextNode leaving "left" side of the split in it and returns new TextNode which holds remaining (right) side of the split like ifTextNode node1
holds"foo bar"
afterTextNode node2 = node1.splitText(3);
node1
will hold"foo"
whilenode2
will hold" bar"
and will be placed as immediate sibling afternode1
wrap(htmlElement)
(inherited fromNode
superclass) which wraps TextNode in ElementNode representinghtmlElement
for instancenode.wrap("<span class='myClass'>")
will result in<span class='myClass>text from node</span>
.
使用上面的工具",我们可以创建类似的方法
With above "tools" we can create method like
static void wrapTextWithElement(TextNode textNode, String strToWrap, String wrapperHTML) {
while (textNode.text().contains(strToWrap)) {
// separates part before strToWrap
// and returns node starting with text we want
TextNode rightNodeFromSplit = textNode.splitText(textNode.text().indexOf(strToWrap));
// if there is more text after searched string we need to
// separate it and handle in next iteration
if (rightNodeFromSplit.text().length() > strToWrap.length()) {
textNode = rightNodeFromSplit.splitText(strToWrap.length());
// after separating remining part rightNodeFromSplit holds
// only part which we ware looking for so lets wrap it
rightNodeFromSplit.wrap(wrapperHTML);
} else { // here we know that node is holding only text to wrap
rightNodeFromSplit.wrap(wrapperHTML);
return;// since textNode didn't change but we already handled everything
}
}
}
我们可以这样使用:
Document doc = Jsoup.parse("<div>b a b <span>b c</span> d b</div> ");
System.out.println("BEFORE CHANGES:");
System.out.println(doc);
Element id1 = doc.select("div").first();
for (TextNode textNode : id1.textNodes()) {
wrapTextWithElement(textNode, "b", "<span class='x'>");
}
System.out.println();
System.out.println("AFTER CHANGES");
System.out.println(doc);
结果:
BEFORE CHANGES:
<html>
<head></head>
<body>
<div>
b a b
<span>b c</span> d b
</div>
</body>
</html>
AFTER CHANGES
<html>
<head></head>
<body>
<div>
<span class="x">b</span> a
<span class="x">b</span>
<span>b c</span> d
<span class="x">b</span>
</div>
</body>
</html>
这篇关于如何用< span>包装文本的一部分或任何其他没有新的HTML结构被逃逸的HTML标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!