问题描述
我有以下函数,将XML文件解析为字典。
不幸的是,由于Python字典没有排序,我无法
我如何改变它,所以它输出一个有序的字典,反映的节点的原始顺序循环与'for'。
def simplexml_load_file(file):
import collections
from lxml import etree
tree = etree.parse(file)
root = tree.getroot()
def xml_to_item(el):
item = None
if el.text :
item = el.text
child_dicts = collections.defaultdict(list)
el.getchildren()中的子节点:$ b $ b child_dicts [child.tag] .append(xml_to_item child))
return dict(child_dicts)或item
def xml_to_dict(el):
return {el.tag:xml_to_item(el)}
return xml_to_dict(root)
x = simplexml_load_file('routines / test.xml')
print x
for y in x ['root ']:
print y
输出:
{'root':{
'a':['1'],
'aa' {'b':[{'c':['2']},'2']}],
'aaaa':[{'bb':['4']}],
'aaa':['3'],
'aaaaa':['5']
}}
a
aa
aaaa
aaa
aaaaa
如何实现collections.OrderedDict,
< root>
< a> 1< / a>
< aa>
< b>
< c> 2< / c>
< / b>
< b> 2< / b>
< / aa>
< aaa> 3< / aaa>
< aaaa>
< bb> 4< / bb>
< / aaaa>
< aaaaa> 5< / aaaaa>
< / root>
您可以使用新的 dict
子类,它被添加到2.7版本中的标准库的 collections
模块中。实际上你需要的是不存在的有序的
+ defaultdict
组合,但是可以通过子类化 OrderedDict
如下所示:
import collections
pre>
class OrderedDefaultdict(collections.OrderedDict):
一个以OrderedDict作为其基类的默认描述
def __init __(self,default_factory = None,* args,** kwargs):
如果没有(default_factory是None
或isinstance(default_factory,collections.Callable)):
raise TypeError('first argument must be callable or无)
super(OrderedDefaultdict,self).__ init __(* args,** kwargs)
self.default_factory = default_factory#通过__missing __()调用
def __missing __ ,key):
if self.default_factory is None:
raise KeyError(key,)
self [key] = value = self.default_factory()
返回值
def __reduce __(self):#可选,对于pickle支持
args =(self.default_factory,)if self.default_factory else tuple()
return self .__ class__,args,无,self.iteritems()
def __repr __(self):#optional
return'%s(%r,%r)'%(self .__ class __.__ name__,self.default_factory ,
list(self.iteritems()))
def simplexml_load_file(file):
from lxml import etree
tree = etree.parse文件)
root = tree.getroot()
def xml_to_item(el):
item = el.text或None
child_dicts = OrderedDefaultdict $ b for child in el.getchildren():
child_dicts [child.tag] .append(xml_to_item(child))
return collections.OrderedDict(child_dicts)或item
def xml_to_dict(el):
return {el.tag:xml_to_item(el)}
return xml_to_dict(root)
x = simplexml_load_file('routines / test。 x')
print(x)
for y in x ['root']:
print(y)
从测试XML文件生成的输出如下所示:
输出:
{'root':
OrderedDict(
[('a',['1 ']),
('aa',[OrderedDict([('b',[OrderedDict([('c',['2'])])
('aaa',['3']),
('aaaa',[OrderedDict([('bb',['4'])]) 'aaaaa',['5'])
]
)
}
a
aa
aaa
aaaa
aaaaa
我认为这是接近你想要的。
*如果您的Python版本没有在v2.5中引入的OrderedDict,您可以使用Raymond Hettinger的。ActiveState食谱作为基类。
小更新:
添加了一个
__ reduce __()
方法,它将允许类的实例被正确地pickle和unpickled。这不是此问题的必要条件,但在一栏中出现。I have the following function which does a crude job of parsing an XML file into a dictionary.
Unfortunately, since Python dictionaries are not ordered, I am unable to cycle through the nodes as I would like.
How do I change this so it outputs an ordered dictionary which reflects the original order of the nodes when looped with 'for'.
def simplexml_load_file(file): import collections from lxml import etree tree = etree.parse(file) root = tree.getroot() def xml_to_item(el): item = None if el.text: item = el.text child_dicts = collections.defaultdict(list) for child in el.getchildren(): child_dicts[child.tag].append(xml_to_item(child)) return dict(child_dicts) or item def xml_to_dict(el): return {el.tag: xml_to_item(el)} return xml_to_dict(root) x = simplexml_load_file('routines/test.xml') print x for y in x['root']: print y
Outputs:
{'root': { 'a': ['1'], 'aa': [{'b': [{'c': ['2']}, '2']}], 'aaaa': [{'bb': ['4']}], 'aaa': ['3'], 'aaaaa': ['5'] }} a aa aaaa aaa aaaaa
How can i implement collections.OrderedDict so that I can be sure of getting the correct order of the nodes?
XML file for reference:
<root> <a>1</a> <aa> <b> <c>2</c> </b> <b>2</b> </aa> <aaa>3</aaa> <aaaa> <bb>4</bb> </aaaa> <aaaaa>5</aaaaa> </root>
解决方案You could use the new
OrderedDict
dict
subclass which was added to the standard library'scollections
module in version 2.7*. Actually what you need is anOrdered
+defaultdict
combination which doesn't exist—but it's possible to create one by subclassingOrderedDict
as illustrated below:import collections class OrderedDefaultdict(collections.OrderedDict): """ A defaultdict with OrderedDict as its base class. """ def __init__(self, default_factory=None, *args, **kwargs): if not (default_factory is None or isinstance(default_factory, collections.Callable)): raise TypeError('first argument must be callable or None') super(OrderedDefaultdict, self).__init__(*args, **kwargs) self.default_factory = default_factory # called by __missing__() def __missing__(self, key): if self.default_factory is None: raise KeyError(key,) self[key] = value = self.default_factory() return value def __reduce__(self): # optional, for pickle support args = (self.default_factory,) if self.default_factory else tuple() return self.__class__, args, None, None, self.iteritems() def __repr__(self): # optional return '%s(%r, %r)' % (self.__class__.__name__, self.default_factory, list(self.iteritems())) def simplexml_load_file(file): from lxml import etree tree = etree.parse(file) root = tree.getroot() def xml_to_item(el): item = el.text or None child_dicts = OrderedDefaultdict(list) for child in el.getchildren(): child_dicts[child.tag].append(xml_to_item(child)) return collections.OrderedDict(child_dicts) or item def xml_to_dict(el): return {el.tag: xml_to_item(el)} return xml_to_dict(root) x = simplexml_load_file('routines/test.xml') print(x) for y in x['root']: print(y)
The output produced from your test XML file looks like this:
Output:
{'root': OrderedDict( [('a', ['1']), ('aa', [OrderedDict([('b', [OrderedDict([('c', ['2'])]), '2'])])]), ('aaa', ['3']), ('aaaa', [OrderedDict([('bb', ['4'])])]), ('aaaaa', ['5']) ] ) } a aa aaa aaaa aaaaa
Which I think is close to what you want.
*If your version of Python doesn't have OrderedDict, which was introduced in v2.5 you may be able use Raymond Hettinger's Ordered Dictionary for Py2.4 ActiveState recipe as a base class instead.
Minor update:
Added a
__reduce__()
method which will allow the instances of the class to be pickled and unpickled properly. This wasn't necessary for this question, but came up in similar one.这篇关于如何重写这个函数来实现OrderedDict?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!