


I have a large XML document I am looking to parse. In this document, many tags have different attributes within them. For example:

 <song-name type="published">Do Re Mi</song-name>

目前,我正在使用Rail的哈希分析库,要求'active_support / core_ext / hash'

Currently, I am using Rail's hash-parsing library by requiring 'active_support/core_ext/hash'.


When I convert it to a hash, it drops the attributes. It returns:

{"album"=>{"song-name"=>"Do Re Mi"}}

如何维护这些属性,在这种情况下, type =发布属性?

How do I maintain those attributes, in this case, the type="published" attribute?

这似乎是以前在 ,但没有确定的答案,但那是从2010年开始的,而且我很好奇自从那时起情况发生了变化。或者,我想知道是否知道解析此XML的另一种方法,以便我仍然可以包含属性信息。

This seems to have been previously been asked in "How can I use XML attributes when converting into a hash with from_xml?", which had no conclusive answer, but that was from 2010, and I'm curious if things have changed since then. Or, I wonder if you know of an alternative way of parsing this XML so that I could still have the attribute information included.



Converting XML to a hash isn't a good solution. You're left with a hash that is more difficult to parse than the original XML. Plus, if the XML is too big, you'll be left with a hash that won't fit into memory, and can't be processed, whereas the original XML could be parsed using a SAX parser.

假设文件在加载时不会压倒你的内存,我建议使用 Nokogiri

Assuming the file isn't going to overwhelm your memory when loaded, I'd recommend using Nokogiri to parse it, doing something like:

require 'nokogiri'

class Album

  attr_reader :song_name, :song_type
  def initialize(song_name, song_type)
    @song_name = song_name
    @song_type = song_type

xml = <<EOT
   <song-name type="published">Do Re Mi</song-name>
    <song-name type="unpublished">Blah blah blah</song-name>

albums = []
doc = Nokogiri::XML(xml)
doc.search('album').each do |album|
  song_name = album.at('song-name')
  albums << Album.new(

puts albums.first.song_name
puts albums.last.song_type


Which outputs:

Do Re Mi

代码首先定义一个合适的对象来保存你想要的数据。当XML被解析为DOM时,代码将遍历所有< album> 节点,并提取信息,定义该类的一个实例,并将其附加到到专辑数组。

The code starts by defining a suitable object to be used to hold the data you want. When the XML is parsed into a DOM, the code will loop through all the <album> nodes, and extract the information, defining an instance of the class, and appending it to the albums array.


After running you'd have an array you would walk, and process each item, storing it into a database, or manipulating it however you want. Though, if your goal is to insert that information into a database, you'd be smarter to let the DBM read the XML and import it directly.

