问题描述
有人知道是否有可能以某种方式利用XMLSlurper,这意味着可以从一个非常大的XML文档中提取各个子树并单独处理?
想象一下,你有一个巨大的XML提要,其中包含一个具有数千个可以单独处理的直接子元素的根元素。很明显,将整个文档读入内存是一个不容否认的问题,但由于每个子节点的大小本身不大,因此在整个文档中进行流式处理会很好,但可以依次对每个子元素应用XMLSlurper。在处理每个子元素时,垃圾回收可以清理用于处理它的内存。通过这种方式,我们可以非常轻松地使用XMLSlurper(如此简洁的语法)和低流量内存(例如SAX)。
我很想知道是否任何人都有关于此的想法和/或您是否已经遇到了这个需求。
您可以将StAX API与 XmlSlurper
来解析子树。
//使用StAX分割的例子一个大的XML文档并使用XmlSlurper解析单个元素
导入javax.xml.stream.XMLInputFactory
导入javax.xml.stream.XMLStreamReader
导入javax.xml.transform。转换器
导入javax.xml.transform.TransformerFactory
导入javax.xml.transform.sax.SAXResult
导入javax.xml.transform.stax.StAXSource
def url = new URL(http://repo2.maven.org/maven2/archetype-catalog.xml)
url.withInputStream {inputStream - >
def xmlStreamReader = XMLInputFactory.newInstance()。createXMLStreamReader(inputStream)
def transformer = TransformerFactory.newInstance()。newTransformer()
while(xmlStreamReader.hasNext()){
xmlStreamReader.isxt()
if(xmlStreamReader.isStartElement()&& xmlStreamReader.getLocalName()=='archetype'){
//分割大型XML文档并解析单个元素的示例与XmlSlurper一起
def xmlSlurper = new XmlSlurper()
transformer.transform(新的StAXSource(xmlStreamReader),新的SAXResult(xmlSlurper))
def archetype = xmlSlurper.document
println$ {archetype.groupId} $ {archetype.artifactId} $ {archetype.version}
}
}
}
Does anyone know whether it is possible to utilise XMLSlurper in a fashion that means individual sub-trees can be pulled from a very large XML document and processed individually?
Imagine you've got a huge XML feed containing a root element that has thousands of direct child elements that you can process individually. Obviously, reading the whole document into memory is a no-no but, as each child of the root is itself modestly sized, it would be nice to stream through the document but apply XMLSlurper niceness to each of the child elements in turn. As each child element is processed, garbage collection can clean up memory used to process it. In this way we get the great ease of XMLSlurper (such concise syntax) with the low memory footprint of streaming (e.g. SAX).
I'd be interested to know if anyone has ideas on this and/or whether you've come across this requirement yourselves.
You can use StAX API together with XmlSlurper
to parse subtrees.
// Example of using StAX to split a large XML document and parse a single element using XmlSlurper
import javax.xml.stream.XMLInputFactory
import javax.xml.stream.XMLStreamReader
import javax.xml.transform.Transformer
import javax.xml.transform.TransformerFactory
import javax.xml.transform.sax.SAXResult
import javax.xml.transform.stax.StAXSource
def url = new URL("http://repo2.maven.org/maven2/archetype-catalog.xml")
url.withInputStream { inputStream ->
def xmlStreamReader = XMLInputFactory.newInstance().createXMLStreamReader(inputStream)
def transformer = TransformerFactory.newInstance().newTransformer()
while (xmlStreamReader.hasNext()) {
xmlStreamReader.next()
if (xmlStreamReader.isStartElement() && xmlStreamReader.getLocalName() == 'archetype') {
// Example of splitting a large XML document and parsing a single element with XmlSlurper at a time
def xmlSlurper = new XmlSlurper()
transformer.transform(new StAXSource(xmlStreamReader), new SAXResult(xmlSlurper))
def archetype = xmlSlurper.document
println "${archetype.groupId} ${archetype.artifactId} ${archetype.version}"
}
}
}
这篇关于是否可以使用Groovy XMLSlurper解析子树?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!