本文介绍了懒惰地解析非常大的xml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个巨大的 xml 文件 (40 gbs).我想从中提取一些字段而不将整个文件加载到内存中.有什么建议吗?
I have a huge xml file (40 gbs). I would like to extract some fields from it without loading the entire file into memory. Any suggestions?
推荐答案
基于 SAXParser 教程的 XMLEventReader 快速示例 此处(由 Rinat Tainov 发布).
A quick example with XMLEventReader based on a tutorial for SAXParser here (as posted by Rinat Tainov).
我相信它可以做得更好,但只是为了显示基本用法:
I'm sure it can be done better but just to show basic usage:
import scala.io.Source
import scala.xml.pull._
object Main extends App {
val xml = new XMLEventReader(Source.fromFile("test.xml"))
def printText(text: String, currNode: List[String]) {
currNode match {
case List("firstname", "staff", "company") => println("First Name: " + text)
case List("lastname", "staff", "company") => println("Last Name: " + text)
case List("nickname", "staff", "company") => println("Nick Name: " + text)
case List("salary", "staff", "company") => println("Salary: " + text)
case _ => ()
}
}
def parse(xml: XMLEventReader) {
def loop(currNode: List[String]) {
if (xml.hasNext) {
xml.next match {
case EvElemStart(_, label, _, _) =>
println("Start element: " + label)
loop(label :: currNode)
case EvElemEnd(_, label) =>
println("End element: " + label)
loop(currNode.tail)
case EvText(text) =>
printText(text, currNode)
loop(currNode)
case _ => loop(currNode)
}
}
}
loop(List.empty)
}
parse(xml)
}
这篇关于懒惰地解析非常大的xml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!