从python的嵌套列表中获取唯一值

本文介绍了从python的嵌套列表中获取唯一值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个嵌套列表(列表列表)，我想删除重复项，但出现错误.这是一个示例:

I have a nested list (list of list) and I want to remove the duplicates but I'm getting an error. This is an example:

images = [
    [
        {
            "image_link": "1969.1523.001.aa.cs.jpg",
            "catalogue_number": "1969.1523",
            "dataset_name": "marine-transportation-transports-maritimes.xml"
        },
        {
            "image_link": "1969.1523.001.aa.cs.jpg",
            "catalogue_number": "1969.1523",
            "dataset_name": "railway-transportation-transports-ferroviaires.xml"
        }
    ],
    [
        {
            "image_link": "1969.1523.001.aa.cs.jpg",
            "catalogue_number": "1969.1523",
            "dataset_name": "marine-transportation-transports-maritimes.xml"
        },
        {
            "image_link": "1969.1523.001.aa.cs.jpg",
            "catalogue_number": "1969.1523",
            "dataset_name": "railway-transportation-transports-ferroviaires.xml"
        }
    ],
    [
        {
            "image_link": "1969.1523.001.aa.cs.jpg",
            "catalogue_number": "1969.1523",
            "dataset_name": "marine-transportation-transports-maritimes.xml"
        },
        {
            "image_link": "1969.1523.001.aa.cs.jpg",
            "catalogue_number": "1969.1523",
            "dataset_name": "railway-transportation-transports-ferroviaires.xml"
        }
    ]
]

所以最终，该images仅包含

[
    [
        {
            "image_link": "1969.1523.001.aa.cs.jpg",
            "catalogue_number": "1969.1523",
            "dataset_name": "marine-transportation-transports-maritimes.xml"
        },
        {
            "image_link": "1969.1523.001.aa.cs.jpg",
            "catalogue_number": "1969.1523",
            "dataset_name": "railway-transportation-transports-ferroviaires.xml"
        }
    ]
]

我正在使用set函数

set.__doc__
'set() -> new empty set object\nset(iterable) -> new set object\n\nBuild an unor
dered collection of unique elements.'

我的跟踪日志:

list(set(images))
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: unhashable type: 'list'

为了简化起见，如何删除此示例中的所有重复项

To make it simpler how can I remove all the duplicate in this example

example = [ [{'a':1, 'b':2}, 'w', 2], [{'a':1, 'b':2}, 'w', 2] ]
#result
#example = [[{'a':1, 'b':2}, 'w', 2] ]

推荐答案

set和dict容器依赖于数据散列.其他可变容器，例如list(以及set和dict本身)不能被散列.它们可能稍后会更改(可变)，因此恒定的哈希值毫无意义.

The set and dict containers rely on hashing of data. Other mutable containers like list (and the set and dict themselves) cannot be hashed. They may be changed later on (mutable), so a constant hash value makes no sense.

但是您可以将所有数据转换为(嵌套的)元组，最后转换为set.由于tuple是不可变容器-您的数据是可哈希化的(字符串)-因此可以正常工作.这是一个特殊的 images 情况下令人讨厌的单行代码:

But you could transform all your data to (nested) tuples and finally into a set. Since tuple is an immutable container - and your data is hashable (strings) - it can work. Here's a nasty one-liner for your special images case that does the trick:

images_Set = set([tuple([tuple(sorted(image_dict.items()))
    for image_dict in inner_list])  for inner_list in images])

和

print(images_set)

打印

{((('catalogue_number', '1969.1523'),
   ('dataset_name', 'marine-transportation-transports-maritimes.xml'),
   ('image_link', '1969.1523.001.aa.cs.jpg')),
  (('catalogue_number', '1969.1523'),
   ('dataset_name', 'railway-transportation-transports-ferroviaires.xml'),
   ('image_link', '1969.1523.001.aa.cs.jpg')))}

编辑:词典的items功能没有无保证的顺序.因此，我还添加了sorted以确保订购.

EDIT: There's no guaranteed order for the items function of dictionaries. Hence, I also added sorted to ensure an order.

这篇关于从python的嵌套列表中获取唯一值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！