问题描述
我有一个很大的XML文档,我正在分析。在这个文档中,许多标签都有不同的属性。例如:
I have a large XML document I am looking to parse. In this document, many tags have different attributes within them. For example:
<album>
<song-name type="published">Do Re Mi</song-name>
</album>
目前,我正在使用Rail的哈希分析库,要求'active_support / core_ext / hash'
。
Currently, I am using Rail's hash-parsing library by requiring 'active_support/core_ext/hash'
.
当我将它转换为散列值时,它会删除属性。它返回:
When I convert it to a hash, it drops the attributes. It returns:
{"album"=>{"song-name"=>"Do Re Mi"}}
如何维护这些属性,在这种情况下, type =发布
属性?
How do I maintain those attributes, in this case, the type="published"
attribute?
这似乎是以前在 ,但没有确定的答案,但那是从2010年开始的,而且我很好奇自从那时起情况发生了变化。或者,我想知道是否知道解析此XML的另一种方法,以便我仍然可以包含属性信息。
This seems to have been previously been asked in "How can I use XML attributes when converting into a hash with from_xml?", which had no conclusive answer, but that was from 2010, and I'm curious if things have changed since then. Or, I wonder if you know of an alternative way of parsing this XML so that I could still have the attribute information included.
推荐答案
将XML转换为散列并不是一个好的解决方案。您留下的哈希值比原始XML更难解析。另外,如果XML太大,你将留下一个散列,不适合内存,不能被处理,而原始的XML可以使用SAX解析器进行分析。
Converting XML to a hash isn't a good solution. You're left with a hash that is more difficult to parse than the original XML. Plus, if the XML is too big, you'll be left with a hash that won't fit into memory, and can't be processed, whereas the original XML could be parsed using a SAX parser.
假设文件在加载时不会压倒你的内存,我建议使用 Nokogiri
require 'nokogiri'
class Album
attr_reader :song_name, :song_type
def initialize(song_name, song_type)
@song_name = song_name
@song_type = song_type
end
end
xml = <<EOT
<xml>
<album>
<song-name type="published">Do Re Mi</song-name>
</album>
<album>
<song-name type="unpublished">Blah blah blah</song-name>
</album>
</xml>
EOT
albums = []
doc = Nokogiri::XML(xml)
doc.search('album').each do |album|
song_name = album.at('song-name')
albums << Album.new(
song_name.text,
song_name['type']
)
end
puts albums.first.song_name
puts albums.last.song_type
Do Re Mi
unpublished
代码首先定义一个合适的对象来保存你想要的数据。当XML被解析为DOM时,代码将遍历所有< album>
节点,并提取信息,定义该类的一个实例,并将其附加到到专辑
数组。
运行后,您将拥有一个数组,将其存储到数据库中,或者根据需要操作它。但是,如果您的目标是将该信息插入到数据库中,那么让DBM读取XML并直接导入它会更聪明。
这篇关于将XML转换为Ruby哈希时保留属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!