HTMLKit Swift解析 | Swift解析

本文介绍了HTMLKit Swift解析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

元素之间的解析例如

<span>7:33</span>AM          </dd>\n
<dt>Dinner</dt>\n          <dd id=\"Dinner\">\n            <span>12:23</span>PM          </dd>\n                                    <dt>Lunch</dt>\n          <dd id=\"Lunch\">\n            <span>2:43</span>PM          </dd>\n

我如何获得"AM/PM"?值

how do I get "AM/PM" Values

    let test: [String] = document.querySelectorAll("span").compactMap({element in
        guard let span = document.querySelector("dt") else {
            return nil
        }

        return span.elementId

    })

这只是九点7:33循环:(

this is just looping 7:33 nine time :(

推荐答案

您可以使用与其他浏览器相同的方法来解决此问题.问题不是特定于HTMLKit的.

you can solve this the same way you would do it on any other browser. The problem is not HTMLKit specific.

由于无法通过CSS选择HTML文本节点，因此必须选择其父节点，然后通过 textContent 属性访问文本，或访问父节点的子节点.

Since there is no way to select a HTML Text Node via CSS, you have to select its parent and then access the text via the textContent property or access the parent node's child nodes.

以下是一些解决问题的选项，以HTMLKit为例，并提供以下示例DOM:

So here are some options to solve your problem, using HTMLKit as an example and the following sample DOM:

let html = """
<html>
<body>
<dl>
  <dt>Breakfast</dt>
  <dd id="Breakfast"><span>10:00</span>AM</dd>
  <dt>Dinner</dt>
  <dd id="Dinner"><span>12:23</span>PM</dd>
</dl>
</body>
</html>
"""

let doc = HTMLDocument(string: html)
let elements = doc.querySelectorAll("dd")

选项1:选择 dd 元素并访问 textContent

Option 1: Select the dd elements and access the textContent

elements.forEach { ddElement in
  print(ddElement.textContent)
}

// Would produce:
// 10:00AM
// 12:23PM

选项2:选择 dd 元素并遍历其子节点，同时过滤掉除 HTMLText 节点以外的所有内容.此外，您可以提供自己的自定义过滤器:

Option 2: Select the dd elements and iterate through their child nodes, while filtering out everything except for HTMLText nodes. Additionally you can provide your own custom filter:

elements.forEach { ddElement in
  let iter: HTMLNodeIterator = ddElement.nodeIterator(showOptions: [.text], filter: nil)
  iter.forEach { node  in
    let textNode = node as! HTMLText
    print(textNode.textContent)
  }
}

// Would produce:
// 10:00
// AM
// 12:23
// PM

选项3:在上一个选项的基础上，您可以为节点迭代器提供自定义过滤器:

for dd in elements {
  let iter: HTMLNodeIterator = dd.nodeIterator(showOptions: [.text]) { node in
    if !node.textContent.contains("AM") && !node.textContent.contains("PM") {
        return .reject
    }
    return .accept
  }

  iter.forEach { node  in
    let textNode = node as! HTMLText
    print(textNode.textContent)
  }
}

// Would produce:
// AM
// PM

选项4:将 AM 和 PM 包裹在自己的&span> 元素中，然后访问这些元素，例如与 dd>跨度选择器:

Option 4: Wrap the AM and PM in their own <span> elements and access those, e.g. with dd > span selector:

doc.querySelectorAll("dd > span").forEach { elem in
   print(elem.textContent)
}

// Given the sample DOM would produce:
// 10:00
// 12:23

// if you wrap the am/pm in spans then you would also get those in the output

您的代码段会产生: ["，"] 和上面的示例DOM.这是原因:

Your snippet produces: ["", ""] with the sample DOM from above. Here is why:

let test: [String] = doc.querySelectorAll("span")
  .compactMap { element in  // element is a <span> HTMLElement

    // However the elements returned here are <dt> elements and not <span>
    guard let span = doc.querySelector("dt") else {
        return nil
    }
    // The <dt> elements in the DOM do not have IDs, hence an empty string is returned
    return span.elementId
  }

我希望这会有所帮助并澄清一些事情.

I hope this helps and clarifies some things.

这篇关于HTMLKit Swift解析的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！