本文介绍了随机获取字典样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用大型词典,由于某种原因,我还需要处理该词典中的少量随机样本.如何获得这个小样本(例如长度为2的样本)?

I'm working with a big dictionary and for some reason I also need to work on small random samples from that dictionary. How can I get this small sample (for example of length 2)?

这是一个玩具模型:

dy={'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

我需要对dy执行一些涉及所有条目的任务.让我们说,为简单起见,我需要将所有值加在一起:

I need to perform some task on dy which involves all the entries. Let us say, to simplify, I need to sum together all the values:

s=0
for key in dy.key:
    s=s+dy[key]

现在,我还需要对dy的随机样本执行相同的任务;为此,我需要dy键的随机样本.我能想到的简单解决方案是

Now, I also need to perform the same task on a random sample of dy; for that I need a random sample of the keys of dy. The simple solution I can imagine is

sam=list(dy.keys())[:1]

通过这种方式,我得到了字典中两个键的列表,这些键在某种程度上是随机的.因此,回到may任务,我需要对代码进行的唯一更改是:

In that way I have a list of two keys of the dictionary which are somehow random. So, going back to may task, the only change I need in the code is:

s=0
for key in sam:
    s=s+dy[key]

重点是我不完全了解dy.keys的构造方式,因此我无法预见任何未来的问题

The point is I do not fully understand how dy.keys is constructed and then I can't foresee any future issue

推荐答案

给出您的示例:

dy = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

然后将所有值的总和简单地表示为:

Then the sum of all the values is more simply put as:

s = sum(dy.values())

然后,如果不是不禁止使用内存,则可以使用以下示例进行采样:

Then if it's not memory prohibitive, you can sample using:

import random

values = list(dy.values())
s = sum(random.sample(values, 2))

或者,由于random.sample可以接受类似set的对象,因此:

Or, since random.sample can take a set-like object, then:

from operator import itemgetter
import random

s = sum(itemgetter(*random.sample(dy.keys(), 2))(dy))

或仅使用:

s = sum(dy[k] for k in random.sample(dy.keys(), 2))

一种替代方法是使用heapq,例如:

An alternative is to use a heapq, eg:

import heapq
import random

s = sum(heapq.nlargest(2, dy.values(), key=lambda L: random.random()))

这篇关于随机获取字典样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 16:24