问题描述
我有一个嵌套列表(列表列表),我想删除重复项,但出现错误.这是一个示例:
I have a nested list (list of list) and I want to remove the duplicates but I'm getting an error. This is an example:
images = [
[
{
"image_link": "1969.1523.001.aa.cs.jpg",
"catalogue_number": "1969.1523",
"dataset_name": "marine-transportation-transports-maritimes.xml"
},
{
"image_link": "1969.1523.001.aa.cs.jpg",
"catalogue_number": "1969.1523",
"dataset_name": "railway-transportation-transports-ferroviaires.xml"
}
],
[
{
"image_link": "1969.1523.001.aa.cs.jpg",
"catalogue_number": "1969.1523",
"dataset_name": "marine-transportation-transports-maritimes.xml"
},
{
"image_link": "1969.1523.001.aa.cs.jpg",
"catalogue_number": "1969.1523",
"dataset_name": "railway-transportation-transports-ferroviaires.xml"
}
],
[
{
"image_link": "1969.1523.001.aa.cs.jpg",
"catalogue_number": "1969.1523",
"dataset_name": "marine-transportation-transports-maritimes.xml"
},
{
"image_link": "1969.1523.001.aa.cs.jpg",
"catalogue_number": "1969.1523",
"dataset_name": "railway-transportation-transports-ferroviaires.xml"
}
]
]
所以最终,该images
仅包含
[
[
{
"image_link": "1969.1523.001.aa.cs.jpg",
"catalogue_number": "1969.1523",
"dataset_name": "marine-transportation-transports-maritimes.xml"
},
{
"image_link": "1969.1523.001.aa.cs.jpg",
"catalogue_number": "1969.1523",
"dataset_name": "railway-transportation-transports-ferroviaires.xml"
}
]
]
我正在使用set
函数
set.__doc__
'set() -> new empty set object\nset(iterable) -> new set object\n\nBuild an unor
dered collection of unique elements.'
我的跟踪日志:
list(set(images))
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: unhashable type: 'list'
为了简化起见,如何删除此示例中的所有重复项
To make it simpler how can I remove all the duplicate in this example
example = [ [{'a':1, 'b':2}, 'w', 2], [{'a':1, 'b':2}, 'w', 2] ]
#result
#example = [[{'a':1, 'b':2}, 'w', 2] ]
推荐答案
set
和dict
容器依赖于数据散列.其他可变容器,例如list
(以及set
和dict
本身)不能被散列.它们可能稍后会更改(可变),因此恒定的哈希值毫无意义.
The set
and dict
containers rely on hashing of data. Other mutable containers like list
(and the set
and dict
themselves) cannot be hashed. They may be changed later on (mutable), so a constant hash value makes no sense.
但是您可以将所有数据转换为(嵌套的)元组,最后转换为set
.由于tuple
是不可变容器-您的数据是可哈希化的(字符串)-因此可以正常工作.这是一个特殊的 images 情况下令人讨厌的单行代码:
But you could transform all your data to (nested) tuples and finally into a set
. Since tuple
is an immutable container - and your data is hashable (strings) - it can work. Here's a nasty one-liner for your special images case that does the trick:
images_Set = set([tuple([tuple(sorted(image_dict.items()))
for image_dict in inner_list]) for inner_list in images])
和
print(images_set)
打印
{((('catalogue_number', '1969.1523'),
('dataset_name', 'marine-transportation-transports-maritimes.xml'),
('image_link', '1969.1523.001.aa.cs.jpg')),
(('catalogue_number', '1969.1523'),
('dataset_name', 'railway-transportation-transports-ferroviaires.xml'),
('image_link', '1969.1523.001.aa.cs.jpg')))}
编辑:词典的items
功能没有无保证的顺序.因此,我还添加了sorted
以确保订购.
EDIT: There's no guaranteed order for the items
function of dictionaries. Hence, I also added sorted
to ensure an order.
这篇关于从python的嵌套列表中获取唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!