问题描述
元素之间的解析例如
<span>7:33</span>AM </dd>\n
<dt>Dinner</dt>\n <dd id=\"Dinner\">\n <span>12:23</span>PM </dd>\n <dt>Lunch</dt>\n <dd id=\"Lunch\">\n <span>2:43</span>PM </dd>\n
我如何获得"AM/PM"?值
how do I get "AM/PM" Values
let test: [String] = document.querySelectorAll("span").compactMap({element in
guard let span = document.querySelector("dt") else {
return nil
}
return span.elementId
})
这只是九点7:33循环:(
this is just looping 7:33 nine time :(
推荐答案
您可以使用与其他浏览器相同的方法来解决此问题.问题不是特定于HTMLKit的.
you can solve this the same way you would do it on any other browser. The problem is not HTMLKit specific.
由于无法通过CSS选择HTML文本节点,因此必须选择其父节点,然后通过 textContent
属性访问文本,或访问父节点的子节点.
Since there is no way to select a HTML Text Node via CSS, you have to select its parent and then access the text via the textContent
property or access the parent node's child nodes.
以下是一些解决问题的选项,以HTMLKit为例,并提供以下示例DOM:
So here are some options to solve your problem, using HTMLKit as an example and the following sample DOM:
let html = """
<html>
<body>
<dl>
<dt>Breakfast</dt>
<dd id="Breakfast"><span>10:00</span>AM</dd>
<dt>Dinner</dt>
<dd id="Dinner"><span>12:23</span>PM</dd>
</dl>
</body>
</html>
"""
let doc = HTMLDocument(string: html)
let elements = doc.querySelectorAll("dd")
- 选项1:选择
dd
元素并访问textContent
- Option 1: Select the
dd
elements and access thetextContent
elements.forEach { ddElement in
print(ddElement.textContent)
}
// Would produce:
// 10:00AM
// 12:23PM
- 选项2:选择
dd
元素并遍历其子节点,同时过滤掉除HTMLText
节点以外的所有内容.此外,您可以提供自己的自定义过滤器: - Option 2: Select the
dd
elements and iterate through their child nodes, while filtering out everything except forHTMLText
nodes. Additionally you can provide your own custom filter:
elements.forEach { ddElement in
let iter: HTMLNodeIterator = ddElement.nodeIterator(showOptions: [.text], filter: nil)
iter.forEach { node in
let textNode = node as! HTMLText
print(textNode.textContent)
}
}
// Would produce:
// 10:00
// AM
// 12:23
// PM
- 选项3:在上一个选项的基础上,您可以为节点迭代器提供自定义过滤器:
for dd in elements {
let iter: HTMLNodeIterator = dd.nodeIterator(showOptions: [.text]) { node in
if !node.textContent.contains("AM") && !node.textContent.contains("PM") {
return .reject
}
return .accept
}
iter.forEach { node in
let textNode = node as! HTMLText
print(textNode.textContent)
}
}
// Would produce:
// AM
// PM
- 选项4:将
AM
和PM
包裹在自己的&span>
元素中,然后访问这些元素,例如与dd>跨度
选择器: - Option 4: Wrap the
AM
andPM
in their own<span>
elements and access those, e.g. withdd > span
selector:
doc.querySelectorAll("dd > span").forEach { elem in
print(elem.textContent)
}
// Given the sample DOM would produce:
// 10:00
// 12:23
// if you wrap the am/pm in spans then you would also get those in the output
您的代码段会产生: [","]
和上面的示例DOM.这是原因:
Your snippet produces: ["", ""]
with the sample DOM from above. Here is why:
let test: [String] = doc.querySelectorAll("span")
.compactMap { element in // element is a <span> HTMLElement
// However the elements returned here are <dt> elements and not <span>
guard let span = doc.querySelector("dt") else {
return nil
}
// The <dt> elements in the DOM do not have IDs, hence an empty string is returned
return span.elementId
}
我希望这会有所帮助并澄清一些事情.
I hope this helps and clarifies some things.
这篇关于HTMLKit Swift解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!