问题描述
正如标题所说,我有一个巨大的xml文件(GB)
As the title says it, I have a huge xml file (GBs)
<root>
<keep>
<stuff> ... </stuff>
<morestuff> ... </morestuff>
</keep>
<discard>
<stuff> ... </stuff>
<morestuff> ... </morestuff>
</discard>
</root>
我想把它变成一个小得多的,只保留一些元素。$
我的解析器应该执行以下操作:
1.解析文件直到相关元素开始。
2.复制整个相关元素(带有孩子们)到输出文件。转到1.
and I'd like to transform it into a much smaller one which retains only a few of the elements.
My parser should do the following:
1. Parse through the file until a relevant element starts.
2. Copy the whole relevant element (with children) to the output file. go to 1.
第一步使用SAX很容易,对DOM解析器来说也是不可能的。
第二步对SAX很烦,但很容易DOM-Parser或XSLT。
step 1 is easy with SAX and impossible for DOM-parsers.
step 2 is annoying with SAX, but easy with the DOM-Parser or XSLT.
那么什么? - 是否有一种结合SAX和DOM-Parser来完成任务的简洁方法?
so what? - is there a neat way to combine SAX and DOM-Parser to do the task?
推荐答案
是的,只需写一个SAX内容处理程序,当遇到某个元素时,就在该元素上构建一个dom树。我用非常大的文件完成了这个,它运行得很好。
Yes, just write a SAX content handler, and when it encounters a certain element, you build a dom tree on that element. I've done this with very large files, and it works very well.
这实际上很简单:只要你遇到你想要的元素的开头,你就在内容处理程序中设置一个标志,然后从那里将所有内容转发给DOM构建器。当遇到元素的结尾时,将标志设置为false,并写出结果。
It's actually very easy: As soon as you encounter the start of the element you want, you set a flag in your content handler, and from there on, you forward everything to the DOM builder. When you encounter the end of the element, you set the flag to false, and write out the result.
(对于具有相同元素名称的嵌套元素的更复杂情况,你需要创建一个堆栈或一个计数器,但这仍然很容易。)
(For more complex cases with nested elements of the same element name, you'll need to create a stack or a counter, but that's still quite easy to do.)
这篇关于如何在java中转换巨大的xml文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!