问题描述
我将Python 2.7和plistlib
一起使用以嵌套dict/array形式导入.plist,然后查找特定的键并将其删除,只要我看到它即可.
I'm using Python 2.7 with plistlib
to import a .plist in a nested dict/array form, then look for a particular key and delete it wherever I see it.
当涉及到我们在办公室使用的实际文件时,我已经知道在哪里可以找到这些值了,但是我写脚本的初衷是我没有,希望我不会如果文件结构发生更改,则将来不必进行更改,或者我们也需要对其他类似文件进行更改.
When it comes to the actual files we're working with in the office, I already know where to find the values -- but I wrote my script with the idea that I didn't, in the hopes that I wouldn't have to make changes in the future if the file structure changes or we need to do likewise to other similar files.
不幸的是,我似乎在迭代字典时试图修改它,但是我不确定它是如何发生的,因为我使用的是iteritems()
和enumerate()
来获取生成器并使用它们而不是我实际上正在使用的对象.
Unfortunately I seem to be trying to modify a dict while iterating over it, but I'm not certain how that's actually happening, since I'm using iteritems()
and enumerate()
to get generators and work with those instead of the object I'm actually working with.
def scrub(someobject, badvalue='_default'): ##_default isn't the real variable
"""Walks the structure of a plistlib-created dict and finds all the badvalues and viciously eliminates them.
Can optionally be passed a different key to search for."""
count = 0
try:
iterator = someobject.iteritems()
except AttributeError:
iterator = enumerate(someobject)
for key, value in iterator:
try:
scrub(value)
except:
pass
if key == badvalue:
del someobject[key]
count += 1
return "Removed {count} instances of {badvalue} from {file}.".format(count=count, badvalue=badvalue, file=file)
不幸的是,当我在测试.plist文件中运行此文件时,出现以下错误:
Unfortunately, when I run this on my test .plist file, I get the following error:
Traceback (most recent call last):
File "formscrub.py", line 45, in <module>
scrub(loadedplist)
File "formscrub.py", line 19, in scrub
for key, value in iterator:
RuntimeError: dictionary changed size during iteration
所以问题可能出在对它本身的递归调用上,但是即使那样,它是否不应该只是从原始对象中删除呢?我不确定如何避免递归(或者如果这是正确的策略),但是由于它是.plist,因此我确实需要能够确定什么时候是字典或列表,并对其进行迭代以寻找(a)更多内容.搜索命令,或者(b)我需要删除的导入.plist中的实际键值对.
So the problem might be the recursive call to itself, but even then shouldn't it just be removing from the original object? I'm not sure how to avoid recursion (or if that's the right strategy) but since it's a .plist, I do need to be able to identify when things are dicts or lists and iterate over them in search of either (a) more dicts to search, or (b) the actual key-value pair in the imported .plist that I need to delete.
最终,这是一个 partial 的非问题,因为我将定期使用的文件具有已知的结构.但是,我真的希望创建一些无关紧要的对象,只要它是其中包含数组的Python字典即可.
Ultimately, this is a partial non-issue, in that the files I'll be working with on a regular basis have a known structure. However, I was really hoping to create something that doesn't care about the nesting or order of the object it's working with, as long as it's a Python dict with arrays in it.
推荐答案
在序列上进行迭代时,在序列中添加/删除项目充其量是棘手的,而且对dicts来说是非法的(正如您刚刚发现的).迭代时从字典中删除条目的正确方法是迭代键的快照.在Python 2.x中,dict.keys()
提供了这样的快照.因此对于字典来说,解决方案是:
Adding or removing items to/from a sequence while iterating over this sequence is tricky at best, and just illegal (as you just discovered) with dicts. The right way to remove entries from a dict while iterating over it is to iterate on a snapshot of the keys. In Python 2.x, dict.keys()
provides such a snapshot. So for dicts the solution is:
for key in mydict.keys():
if key == bad_value:
del mydict[key]
cpizza在评论中提到,对于python3,您需要使用list()
显式创建快照:
As mentionned by cpizza in a comment, for python3, you'll need to explicitely create the snapshot using list()
:
for key in list(mydict.keys()):
if key == bad_value:
del mydict[key]
对于列表,尝试删除索引快照(即for i in len(thelist):
)将在删除任何内容后立即导致IndexError(显然,因为至少最后一个索引将不再存在),即使不存在您可能会跳过一个或多个项目(因为删除一个项目会使索引序列与列表本身不同步). enumerate
对于IndexError是安全的(因为当列表中没有更多下一个"项目时,迭代将自行停止,但是您仍然会跳过以下项目:
For lists, trying to iterate on a snapshot of the indexes (ie for i in len(thelist):
) would result in an IndexError as soon as anything is removed (obviously since at least the last index will no more exist), and even if not you might skip one or more items (since the removal of an item makes the sequence of indexes out of sync with the list itself). enumerate
is safe against IndexError (since the iteration will stop by itself when there's no more 'next' item in the list, but you'll still skip items:
>>> mylist = list("aabbccddeeffgghhii")
>>> for x, v in enumerate(mylist):
... if v in "bdfh":
... del mylist[x]
>>> print mylist
['a', 'a', 'b', 'c', 'c', 'd', 'e', 'e', 'f', 'g', 'g', 'h', 'i', 'i']
如您所见,这不是很成功.
Not a quite a success, as you can see.
这里已知的解决方案是对反向索引进行迭代,即:
The known solution here is to iterate on reversed indexes, ie:
>>> mylist = list("aabbccddeeffgghhii")
>>> for x in reversed(range(len(mylist))):
... if mylist[x] in "bdfh":
... del mylist[x]
>>> print mylist
['a', 'a', 'c', 'c', 'e', 'e', 'g', 'g', 'i', 'i']
这也适用于反向枚举,但我们并不在乎.
This works with reversed enumeration too, but we dont really care.
因此,总结一下:对于字典和列表,您需要两个不同的代码路径-并且还需要注意非容器"值(既不是列表也不是字典的值),而您在此过程中并不关心当前代码.
So to summarize: you need two different code path for dicts and lists - and you also need to take care of "not container" values (values which are neither lists nor dicts), something you do not take care of in your current code.
def scrub(obj, bad_key="_this_is_bad"):
if isinstance(obj, dict):
# the call to `list` is useless for py2 but makes
# the code py2/py3 compatible
for key in list(obj.keys()):
if key == bad_key:
del obj[key]
else:
scrub(obj[key], bad_key)
elif isinstance(obj, list):
for i in reversed(range(len(obj))):
if obj[i] == bad_key:
del obj[i]
else:
scrub(obj[i], bad_key)
else:
# neither a dict nor a list, do nothing
pass
请注意:从不写一个空的except子句.永远不会永远.确实,这应该是非法的语法.
As a side note: never write a bare except clause. Never ever. This should be illegal syntax, really.
这篇关于递归删除字典键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!