我有一个很大的XML文件,下面是该文件的摘录:

...
<LexicalEntry id="Ait~ifAq_1">
  <Lemma partOfSpeech="n" writtenForm="اِتِّفاق"/>
  <Sense id="Ait~ifAq_1_tawaAfuq_n1AR" synset="tawaAfuq_n1AR"/>
  <WordForm formType="root" writtenForm="وفق"/>
</LexicalEntry>
<LexicalEntry id="tawaA&amp;um__1">
  <Lemma partOfSpeech="n" writtenForm="تَوَاؤُم"/>
  <Sense id="tawaA&amp;um__1_AinosijaAm_n1AR" synset="AinosijaAm_n1AR"/>
  <WordForm formType="root" writtenForm="وأم"/>
</LexicalEntry>
<LexicalEntry id="tanaAgum_2">
  <Lemma partOfSpeech="n" writtenForm="تناغُم"/>
  <Sense id="tanaAgum_2_AinosijaAm_n1AR" synset="AinosijaAm_n1AR"/>
  <WordForm formType="root" writtenForm="نغم"/>
</LexicalEntry>


<Synset baseConcept="3" id="tawaAfuq_n1AR">
  <SynsetRelations>
    <SynsetRelation relType="hyponym" targets="AinosijaAm_n1AR"/>
    <SynsetRelation relType="hyponym" targets="AinosijaAm_n1AR"/>
    <SynsetRelation relType="hypernym" targets="ext_noun_NP_420"/>
  </SynsetRelations>
  <MonolingualExternalRefs>
    <MonolingualExternalRef externalReference="13971065-n" externalSystem="PWN30"/>
  </MonolingualExternalRefs>
</Synset>
...


我想从中提取特定信息。对于给定的writtenForm,无论是来自<Lemma>还是<WordForm>,程序都会从​​该synset<Sense>(相同的writtenForm)中获取<LexicalEntry>的值,并搜索该值的所有id<Synset>中的synset具有相同值的<Sense>。之后,该程序为我们提供了该Synset的所有关系,即,它显示relType的值并返回到<LexicalEntry>,并查找与synset相同的<Sense>targets。 cc>然后显示其writtenForm

我认为这有点复杂,但结果应该是这样的:

اِتِّفاق hyponym تَوَاؤُم, اِنْسِجام


解决方案之一是由于内存消耗而使用Stream阅读器。但我不应该如何继续获得想要的东西。请帮帮我。

最佳答案

SAX解析器与DOM解析器不同,它仅查看当前的item,直到它们成为当前的item为止,看不到将来的项目。当XML文件很大时,它是您可以使用的许多之一。代替它的是那里有很多。仅举几例:


SAX解析器
DOM解析器
JDOM解析器
DOM4J解析器
STAX解析器


您可以为他们找到所有教程here

我认为在学习之后,可以直接将DOM4JJDOM用于商业产品。

SAX解析器的逻辑是,您有一个MyHandler类,该类扩展了DefaultHandler@Overrides某些方法:

XML文件:

<?xml version="1.0"?>
<class>
   <student rollno="393">
      <firstname>dinkar</firstname>
      <lastname>kad</lastname>
      <nickname>dinkar</nickname>
      <marks>85</marks>
   </student>
   <student rollno="493">
      <firstname>Vaneet</firstname>
      <lastname>Gupta</lastname>
      <nickname>vinni</nickname>
      <marks>95</marks>
   </student>
   <student rollno="593">
      <firstname>jasvir</firstname>
      <lastname>singn</lastname>
      <nickname>jazz</nickname>
      <marks>90</marks>
   </student>
</class>


处理程序类:

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class UserHandler extends DefaultHandler {

   boolean bFirstName = false;
   boolean bLastName = false;
   boolean bNickName = false;
   boolean bMarks = false;

   @Override
   public void startElement(String uri,
   String localName, String qName, Attributes attributes)
      throws SAXException {
      if (qName.equalsIgnoreCase("student")) {
         String rollNo = attributes.getValue("rollno");
         System.out.println("Roll No : " + rollNo);
      } else if (qName.equalsIgnoreCase("firstname")) {
         bFirstName = true;
      } else if (qName.equalsIgnoreCase("lastname")) {
         bLastName = true;
      } else if (qName.equalsIgnoreCase("nickname")) {
         bNickName = true;
      }
      else if (qName.equalsIgnoreCase("marks")) {
         bMarks = true;
      }
   }

   @Override
   public void endElement(String uri,
   String localName, String qName) throws SAXException {
      if (qName.equalsIgnoreCase("student")) {
         System.out.println("End Element :" + qName);
      }
   }

   @Override
   public void characters(char ch[],
      int start, int length) throws SAXException {
      if (bFirstName) {
         System.out.println("First Name: "
            + new String(ch, start, length));
         bFirstName = false;
      } else if (bLastName) {
         System.out.println("Last Name: "
            + new String(ch, start, length));
         bLastName = false;
      } else if (bNickName) {
         System.out.println("Nick Name: "
            + new String(ch, start, length));
         bNickName = false;
      } else if (bMarks) {
         System.out.println("Marks: "
            + new String(ch, start, length));
         bMarks = false;
      }
   }
}


主类:

import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SAXParserDemo {
   public static void main(String[] args){

      try {
         File inputFile = new File("input.txt");
         SAXParserFactory factory = SAXParserFactory.newInstance();
         SAXParser saxParser = factory.newSAXParser();
         UserHandler userhandler = new UserHandler();
         saxParser.parse(inputFile, userhandler);
      } catch (Exception e) {
         e.printStackTrace();
      }
   }
}

10-08 20:01