随机获取字典样本

本文介绍了随机获取字典样本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用大型词典，由于某种原因，我还需要处理该词典中的少量随机样本.如何获得这个小样本(例如长度为2的样本)?

I'm working with a big dictionary and for some reason I also need to work on small random samples from that dictionary. How can I get this small sample (for example of length 2)?

这是一个玩具模型:

dy={'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

我需要对dy执行一些涉及所有条目的任务.让我们说，为简单起见，我需要将所有值加在一起:

I need to perform some task on dy which involves all the entries. Let us say, to simplify, I need to sum together all the values:

s=0
for key in dy.key:
    s=s+dy[key]

现在，我还需要对dy的随机样本执行相同的任务；为此，我需要dy键的随机样本.我能想到的简单解决方案是

Now, I also need to perform the same task on a random sample of dy; for that I need a random sample of the keys of dy. The simple solution I can imagine is

sam=list(dy.keys())[:1]

通过这种方式，我得到了字典中两个键的列表，这些键在某种程度上是随机的.因此，回到may任务，我需要对代码进行的唯一更改是:

In that way I have a list of two keys of the dictionary which are somehow random. So, going back to may task, the only change I need in the code is:

s=0
for key in sam:
    s=s+dy[key]

重点是我不完全了解dy.keys的构造方式，因此我无法预见任何未来的问题

The point is I do not fully understand how dy.keys is constructed and then I can't foresee any future issue

推荐答案

给出您的示例:

dy = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

然后将所有值的总和简单地表示为:

Then the sum of all the values is more simply put as:

s = sum(dy.values())

然后，如果不是不禁止使用内存，则可以使用以下示例进行采样:

Then if it's not memory prohibitive, you can sample using:

import random

values = list(dy.values())
s = sum(random.sample(values, 2))

或者，由于random.sample可以接受类似set的对象，因此:

Or, since random.sample can take a set-like object, then:

from operator import itemgetter
import random

s = sum(itemgetter(*random.sample(dy.keys(), 2))(dy))

或仅使用:

s = sum(dy[k] for k in random.sample(dy.keys(), 2))

一种替代方法是使用heapq，例如:

An alternative is to use a heapq, eg:

import heapq
import random

s = sum(heapq.nlargest(2, dy.values(), key=lambda L: random.random()))

这篇关于随机获取字典样本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！