我目前正在对大于40 MB的超大型xml文件进行一些解析。我刚刚开始在scala中进行开发,因此我在网上浏览了一些不错的库,并偶然发现了Scala Scales,它似乎非常擅长处理大文件。
我读过了:
http://scala-scales.googlecode.com/svn/sites/scales/scales-xml_2.9.1/0.2/ScalesXmlIntro.html
,
http://scala-scales.googlecode.com/svn/sites/scales/scales-xml_2.9.2/0.4.4/PullParsing.html
然后测试pullXml函数,以确保正确导入所有库。
val pull = pullXml(new FileReader("/Users/mycrazyxml/tmp/large.xml"))
while( pull.hasNext ){
pull.next match {
case Left( i : XmlItem ) =>
// Handle XmlItem
Logger.info("XmlItem: "+i)
case Left( e : Elem ) => {
// Handle Element
Logger.info("Element: "+e)
}
case Right(endElem) =>
// Handle endElement
Logger.info("Endelement: "+endElem)
}
}
这导致整个文件被打印到控制台!真好!
现在是时候创建对象并保存到数据库了,但是我正在
难以很好地掌握如何执行此操作。我真的需要一些很好的例子
如何做到这一点。
例如。以下XML具有多个Enterprise元素,这些元素可以由一个或几个LocalUnit组成。
这里的想法是用一个LocalUnits数组创建一个Enterprise对象。什么时候
endElement是Enterprise的结束标记,使用Enterprise对象及其LocalUnits调用save方法。
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Info SYSTEM "info.dtd">
<Info>
<Enterprise>
<RegNo>12345678</RegNo>
<Address>
<StreetInfo>
<StreetName>Infinite Loop</StreetName>
<StreetNumber>1</StreetNumber>
</StreetInfo>
</Address>
<EName>
<Legal>Crazy Company</Legal>
</EName>
<SNI>
<Code>00000</Code>
<Rank>1</Rank>
</SNI>
<LocalUnit>
<CFARNo>987654321</CFARNo>
<LUType>1</LUType>
<LUName>Crazy Company Gym</LUName>
<LUStatus>1</LUStatus>
<SNI>
<Code>46772</Code>
<Rank>1</Rank>
</SNI>
<SNI>
<Code>68203</Code>
<Rank>2</Rank>
</SNI>
<Address>
<StreetInfo>
<StreetName>Infinite Loop</StreetName>
<StreetNumber>1</StreetNumber>
</StreetInfo>
</Address>
</LocalUnit>
<LocalUnit>
<CFARNo>987654322</CFARNo>
<LUType>1</LUType>
<LUName>Crazy Company Restaurant</LUName>
<LUStatus>1</LUStatus>
<SNI>
<Code>46772</Code>
<Rank>1</Rank>
</SNI>
<SNI>
<Code>68203</Code>
<Rank>2</Rank>
</SNI>
<Address>
<StreetInfo>
<StreetName>Infinite Loop</StreetName>
<StreetNumber>1</StreetNumber>
</StreetInfo>
</Address>
</LocalUnit>
</Enterprise>
<Enterprise>
<RegNo>12345671220</RegNo>
<Address>
<StreetInfo>
<StreetName>Cupertino Road</StreetName>
<StreetNumber>2</StreetNumber>
</StreetInfo>
</Address>
<EName>
<Legal>Fun Company HQ</Legal>
</EName>
<SNI>
<Code>00000</Code>
<Rank>1</Rank>
</SNI>
<LocalUnit>
<CFARNo>987654321</CFARNo>
<LUType>1</LUType>
<LUName>Fun Company</LUName>
<LUStatus>1</LUStatus>
<SNI>
<Code>46772</Code>
<Rank>1</Rank>
</SNI>
<SNI>
<Code>68203</Code>
<Rank>2</Rank>
</SNI>
<Address>
<StreetInfo>
<StreetName>Cupertino road</StreetName>
<StreetNumber>2</StreetNumber>
</StreetInfo>
</Address>
</LocalUnit>
</Enterprise>
</Info>
把它们加起来。对于给定的xml,我应该如何使用pullXml创建对象并使用它们调用save方法?
最佳答案
val xmlFile = resource(this, "/data/enterprise_info.xml")
val xml = pullXml(xmlFile)
val Info = NoNamespaceQName("Info")
val Enterprise = NoNamespaceQName("Enterprise")
val LocalUnit = NoNamespaceQName("LocalUnit")
val LocalUnitName = NoNamespaceQName("LUName")
val EName = NoNamespaceQName("EName")
val Legal = NoNamespaceQName("Legal")
val EnterprisePath = List(Info, Enterprise)
// iterate over each Enterprise
// only an Enterprise at a time is in memory
val itr = iterate(EnterprisePath, xml)
for {
enterprise <- itr
enterpriseName <- enterprise \* EName \* Legal
} {
println("enterprise "+text(enterpriseName) +" has units:")
for {
localUnits <- enterprise \* LocalUnit
localName <- localUnits \* LocalUnitName
}{
println(" " + text(localName))
}
//do a save
}
暂时拉入每个LocalUnit更加困难,您必须为不是LocalUnit的每个子节分别分隔路径。
高度