我目前正在对大于40 MB的超大型xml文件进行一些解析。我刚刚开始在scala中进行开发,因此我在网上浏览了一些不错的库,并偶然发现了Scala Scales,它似乎非常擅长处理大文件。

我读过了:
http://scala-scales.googlecode.com/svn/sites/scales/scales-xml_2.9.1/0.2/ScalesXmlIntro.html

http://scala-scales.googlecode.com/svn/sites/scales/scales-xml_2.9.2/0.4.4/PullParsing.html

然后测试pullXml函数,以确保正确导入所有库。

val pull = pullXml(new FileReader("/Users/mycrazyxml/tmp/large.xml"))
while( pull.hasNext ){
   pull.next match {
        case Left( i : XmlItem ) =>
          // Handle XmlItem
          Logger.info("XmlItem: "+i)

        case Left( e : Elem ) => {
          // Handle Element
          Logger.info("Element: "+e)
        }

        case Right(endElem) =>
          // Handle endElement
          Logger.info("Endelement: "+endElem)
      }
    }


这导致整个文件被打印到控制台!真好!
现在是时候创建对象并保存到数据库了,但是我正在
难以很好地掌握如何执行此操作。我真的需要一些很好的例子
如何做到这一点。

例如。以下XML具有多个Enterprise元素,这些元素可以由一个或几个LocalUnit组成。
这里的想法是用一个LocalUnits数组创建一个Enterprise对象。什么时候
endElement是Enterprise的结束标记,使用Enterprise对象及其LocalUnits调用save方法。

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Info SYSTEM "info.dtd">
<Info>
  <Enterprise>
    <RegNo>12345678</RegNo>
    <Address>
      <StreetInfo>
        <StreetName>Infinite Loop</StreetName>
        <StreetNumber>1</StreetNumber>
      </StreetInfo>
    </Address>
    <EName>
      <Legal>Crazy Company</Legal>
    </EName>
    <SNI>
      <Code>00000</Code>
      <Rank>1</Rank>
    </SNI>
    <LocalUnit>
      <CFARNo>987654321</CFARNo>
      <LUType>1</LUType>
      <LUName>Crazy Company Gym</LUName>
      <LUStatus>1</LUStatus>
      <SNI>
        <Code>46772</Code>
        <Rank>1</Rank>
      </SNI>
      <SNI>
        <Code>68203</Code>
        <Rank>2</Rank>
      </SNI>
      <Address>
        <StreetInfo>
          <StreetName>Infinite Loop</StreetName>
          <StreetNumber>1</StreetNumber>
        </StreetInfo>
      </Address>
    </LocalUnit>
    <LocalUnit>
      <CFARNo>987654322</CFARNo>
      <LUType>1</LUType>
      <LUName>Crazy Company Restaurant</LUName>
      <LUStatus>1</LUStatus>
      <SNI>
        <Code>46772</Code>
        <Rank>1</Rank>
      </SNI>
      <SNI>
        <Code>68203</Code>
        <Rank>2</Rank>
      </SNI>
      <Address>
        <StreetInfo>
          <StreetName>Infinite Loop</StreetName>
          <StreetNumber>1</StreetNumber>
        </StreetInfo>
      </Address>
    </LocalUnit>
  </Enterprise>
<Enterprise>
    <RegNo>12345671220</RegNo>
    <Address>
      <StreetInfo>
        <StreetName>Cupertino Road</StreetName>
        <StreetNumber>2</StreetNumber>
      </StreetInfo>
    </Address>
    <EName>
      <Legal>Fun Company HQ</Legal>
    </EName>
    <SNI>
      <Code>00000</Code>
      <Rank>1</Rank>
    </SNI>
    <LocalUnit>
      <CFARNo>987654321</CFARNo>
      <LUType>1</LUType>
      <LUName>Fun Company</LUName>
      <LUStatus>1</LUStatus>
      <SNI>
        <Code>46772</Code>
        <Rank>1</Rank>
      </SNI>
      <SNI>
        <Code>68203</Code>
        <Rank>2</Rank>
      </SNI>
      <Address>
        <StreetInfo>
          <StreetName>Cupertino road</StreetName>
          <StreetNumber>2</StreetNumber>
        </StreetInfo>
      </Address>
    </LocalUnit>
  </Enterprise>
</Info>


把它们加起来。对于给定的xml,我应该如何使用pullXml创建对象并使用它们调用save方法?

最佳答案

val xmlFile = resource(this, "/data/enterprise_info.xml")
val xml = pullXml(xmlFile)

val Info = NoNamespaceQName("Info")
val Enterprise = NoNamespaceQName("Enterprise")
val LocalUnit = NoNamespaceQName("LocalUnit")
val LocalUnitName = NoNamespaceQName("LUName")
val EName = NoNamespaceQName("EName")
val Legal = NoNamespaceQName("Legal")

val EnterprisePath = List(Info, Enterprise)

// iterate over each Enterprise
// only an Enterprise at a time is in memory
val itr =  iterate(EnterprisePath, xml)

for {
  enterprise <- itr
  enterpriseName <- enterprise \* EName \* Legal
} {
  println("enterprise "+text(enterpriseName) +" has units:")
  for {
    localUnits <- enterprise \* LocalUnit
    localName <- localUnits \* LocalUnitName
  }{
    println("  " + text(localName))
  }
  //do a save
}


暂时拉入每个LocalUnit更加困难,您必须为不是LocalUnit的每个子节分别分隔路径。

高度

07-28 14:25