问题描述
我需要获取第一个<p>
的文本内容,它是<div class="about">
的子代,编写了以下代码:
tagTextS :: IOSArrow XmlTree String
tagTextS = getChildren >>> getText >>> arr stripString
parseDescription :: IOSArrow XmlTree String
parseDescription =
(
deep (isElem >>> hasName "div" >>> hasAttrValue "id" (== "company_about_full_description"))
>>> (arr (\x -> x) /> isElem >>> hasName "p") >. (!! 0) >>> tagTextS
) `orElse` (constA "")
看看这个arr (\x -> x)
–没有它,我将无法达到结果.
- 是否有更好的方式编写
parseDescription
? - 另一个问题为什么在
arr
之前和hasName "p"
之后需要括号? (一世实际上找到了此解决方案这里)
根据需要使用hxt核心的另一项提议.
要强制第一个孩子,不能通过 getChildren 输出来完成,因为hxt箭头具有特定的(>>>),可将后续箭头映射到优先输出的每个列表项,而不是输出列表. ,如 haskellWiki hxt页面所述,尽管这是一个旧定义,但实际上它源自类别(.)组成. /p>可以从 getChildren
import Data.Tree.Class (Tree)
import qualified Data.Tree.Class as T
-- if the nth element does not exist it will return an empty children list
getNthChild :: (ArrowList a, Tree t) => Int -> a (t b) (t b)
getNthChild n = arrL (take 1 . drop n . T.getChildren)
然后您的parseDescription可以采用以下形式:
-- importing Text.XML.HXT.Arrow.XmlArrow (hasName, hasAttrValue)
parseDescription =
deep (isElem >>> hasName "div" >>> hasAttrValue "class" (== "about")
>>> getNthChild 0 >>> hasName "p"
)
>>> getChildren >>> getText
更新.我发现了使用 changeChildren 的另一种方法:
getNthChild :: (ArrowTree a, Tree t) => Int -> a (t b) (t b)
getNthChild n = changeChildren (take 1 . drop n) >>> getChildren
更新:避免元素间间距节点过滤非元素子元素
import qualified Text.XML.HXT.DOM.XmlNode as XN
getNthChild :: (ArrowTree a, Tree t, XN.XmlNode b) => Int -> a (t b) (t b)
getNthChild n = changeChildren (take 1 . drop n . filter XN.isElem) >>> getChildren
I need to get text contents of first <p>
which is children of <div class="about">
, wrote the following code:
tagTextS :: IOSArrow XmlTree String
tagTextS = getChildren >>> getText >>> arr stripString
parseDescription :: IOSArrow XmlTree String
parseDescription =
(
deep (isElem >>> hasName "div" >>> hasAttrValue "id" (== "company_about_full_description"))
>>> (arr (\x -> x) /> isElem >>> hasName "p") >. (!! 0) >>> tagTextS
) `orElse` (constA "")
Look at this arr (\x -> x)
– without it I wasn't be able to reach result.
- Is there a better way to write
parseDescription
? - Another questionis why do I need parentheses before
arr
and afterhasName "p"
? (Iactually found this solution here)
Another proposal using hxt core as you demand.
To enforce the first child, cannot be done through getChildren output, since hxt arrows have a specific (>>>) that maps subsequent arrows to every list item of precedent output and not the output list, as explained in the haskellWiki hxt page although this is an old definition, actually it derives from Category (.) composition.
getNthChild can be hacked from getChildren of Control.Arrow.ArrowTree
import Data.Tree.Class (Tree)
import qualified Data.Tree.Class as T
-- if the nth element does not exist it will return an empty children list
getNthChild :: (ArrowList a, Tree t) => Int -> a (t b) (t b)
getNthChild n = arrL (take 1 . drop n . T.getChildren)
then your parseDescription could take this form:
-- importing Text.XML.HXT.Arrow.XmlArrow (hasName, hasAttrValue)
parseDescription =
deep (isElem >>> hasName "div" >>> hasAttrValue "class" (== "about")
>>> getNthChild 0 >>> hasName "p"
)
>>> getChildren >>> getText
Update. I found another way using changeChildren:
getNthChild :: (ArrowTree a, Tree t) => Int -> a (t b) (t b)
getNthChild n = changeChildren (take 1 . drop n) >>> getChildren
Update: avoid inter-element spacing-nodes filtering non-element children
import qualified Text.XML.HXT.DOM.XmlNode as XN
getNthChild :: (ArrowTree a, Tree t, XN.XmlNode b) => Int -> a (t b) (t b)
getNthChild n = changeChildren (take 1 . drop n . filter XN.isElem) >>> getChildren
这篇关于HXT获得第一个要素:重构怪异的箭头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!