使用SAX解析常见的XML元素

使用SAX解析常见的XML元素

本文介绍了使用SAX解析常见的XML元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用SAX(Java)来解析一些不同的XML文档,每个文档代表不同的数据并且结构略有不同。因此,每个XML文档都由不同的SAX类处理(子类化 DefaultHandler )。

I'm currently using SAX (Java) to parse a a handful of different XML documents, with each document representing different data and having slightly different structures. For this reason, each XML document is handled by a different SAX class (subclassing DefaultHandler).

但是,那里是一些可以出现在所有这些不同文档中的XML结构。理想情况下,我想告诉解析器嘿,当你到达 complex_node 元素时,只需使用 ComplexNodeHandler 即可阅读它,然后给我回复结果。如果你到达 some_other_node ,请使用 OtherNodeHandler 阅读并给我回到那个结果。

However, there are some XML structures that can appear in all these different documents. Ideally, I'd like to tell the parser "Hey, when you reach a complex_node element, just use ComplexNodeHandler to read it, and give me back the result. If you reach a some_other_node, use OtherNodeHandler to read it and give me back that result".

然而,我看不出一个明显的方法来做到这一点。

However, I can't see an obvious way to do this.

我应该只是制作一个单片处理程序类,它可以读取我拥有的所有不同文档(并根除代码重复),或者是否有一种更智能的方法来处理这个问题?

Should I simply just make a monolithic handler class that can read all the different documents I have (and eradicate duplication of code), or is there a smarter way to handle this?

推荐答案

以下是我对类似问题的回答(。它演示了如何在XMLReader上交换内容处理程序。

Below is an answer I made to a similar question (Skipping nodes with sax). It demonstrates how to swap content handlers on an XMLReader.

在此示例中,交换的ContentHandler只是忽略所有事件,直到它放弃控制,但您可以轻松地调整概念。

In this example the swapped in ContentHandler simply ignores all events until it gives up control, but you could adapt the concept easily.

您可以执行以下操作:

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.XMLReader;

public class Demo {

    public static void main(String[] args) throws Exception {
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        XMLReader xr = sp.getXMLReader();
        xr.setContentHandler(new MyContentHandler(xr));
        xr.parse("input.xml");
    }
}

MyContentHandler

此类负责处理XML文档。当您点击要忽略的节点时,您可以交换IgnoringContentHandler,它将吞下该节点的所有事件。

This class is responsible for processing your XML document. When you hit a node you want to ignore you can swap in the IgnoringContentHandler which will swallow all events for that node.

import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;

public class MyContentHandler implements ContentHandler {

    private XMLReader xmlReader;

    public MyContentHandler(XMLReader xmlReader) {
        this.xmlReader = xmlReader;
    }

    public void setDocumentLocator(Locator locator) {
    }

    public void startDocument() throws SAXException {
    }

    public void endDocument() throws SAXException {
    }

    public void startPrefixMapping(String prefix, String uri)
            throws SAXException {
    }

    public void endPrefixMapping(String prefix) throws SAXException {
    }

    public void startElement(String uri, String localName, String qName,
            Attributes atts) throws SAXException {
        if("sodium".equals(qName)) {
            xmlReader.setContentHandler(new IgnoringContentHandler(xmlReader, this));
        } else {
            System.out.println("START " + qName);
        }
    }

    public void endElement(String uri, String localName, String qName)
            throws SAXException {
        System.out.println("END " + qName);
    }

    public void characters(char[] ch, int start, int length)
            throws SAXException {
        System.out.println(new String(ch, start, length));
    }

    public void ignorableWhitespace(char[] ch, int start, int length)
            throws SAXException {
    }

    public void processingInstruction(String target, String data)
            throws SAXException {
    }

    public void skippedEntity(String name) throws SAXException {
    }

}

IgnoringContentHandler

当IgnoringContentHandler完成吞咽事件时,它会将控制权传递给您的主ContentHandler。

When the IgnoringContentHandler is done swallowing events it passes control back to your main ContentHandler.

import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;

public class IgnoringContentHandler implements ContentHandler {

    private int depth = 1;
    private XMLReader xmlReader;
    private ContentHandler contentHandler;

    public IgnoringContentHandler(XMLReader xmlReader, ContentHandler contentHandler) {
        this.contentHandler = contentHandler;
        this.xmlReader = xmlReader;
    }

    public void setDocumentLocator(Locator locator) {
    }

    public void startDocument() throws SAXException {
    }

    public void endDocument() throws SAXException {
    }

    public void startPrefixMapping(String prefix, String uri)
            throws SAXException {
    }

    public void endPrefixMapping(String prefix) throws SAXException {
    }

    public void startElement(String uri, String localName, String qName,
            Attributes atts) throws SAXException {
        depth++;
    }

    public void endElement(String uri, String localName, String qName)
            throws SAXException {
        depth--;
        if(0 == depth) {
           xmlReader.setContentHandler(contentHandler);
        }
    }

    public void characters(char[] ch, int start, int length)
            throws SAXException {
    }

    public void ignorableWhitespace(char[] ch, int start, int length)
            throws SAXException {
    }

    public void processingInstruction(String target, String data)
            throws SAXException {
    }

    public void skippedEntity(String name) throws SAXException {
    }

}

这篇关于使用SAX解析常见的XML元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 23:09