本文介绍了优化与dict.update()类似的功能,但添加了常用值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嘿伙计们,


我以为我会把它扔出去,因为每个人都喜欢优化

问题(这是真的,结账对这类

问题的回复数量!)


现在我有一个如下所示的功能。我想结合两个

词典,并将值加在一起,哪里有重叠。

我认为这个功能会相当快,但我在每个字典中都有大约1

百万的键(约80%重叠)并且它只运行所有

晚上。


所以也许我的功能不是O(n)?我真的认为它可能是

字典查找是O(nlogn)?


无论如何这里是函数。任何关于完成这项任务的想法很快就会被贬值。


谢谢!


< code apology =抱歉,如果gmail杀死我的标签>


def add_freqs(freq1,freq2):

"""添加两个单词freq dicts" ;"

newfreq = {}

为密钥,值为freq1.items():

newfreq [key] = value + freq2 .get(key,0)

表示密钥,值为freq2.items():

newfreq [key] = value + freq1.get(key,0)

返回newfreq


freq1 = {

''dog'':1,

''猫'':2,

''人类'':3

}

freq2 = {

' 'perro'':1,

''gato'':1,

''human'':2

}

print add_freqs(freq1,freq2)


answer_I_want = {

''dog'':1,

''猫'':2,

''perro'':1,

''gato'':1,

' '人'':5
}

< / code>

-

Gregory Pi?ero

首席创新官

混合技术

()

解决方案





-

Gregory Pi?ero

首席创新官

混合技术

( )





使用items()您将整个字典复制到元组列表;
iteritems()只是遍历现有字典并创建一个元组
一次。

80%重叠,你正在查找并在你的for循环中设置两个五个值中的四个。

转储对称并尝试以下方法之一:

def add_freqs2(freq1,freq2):
total = dict(freq1)
for key,freq2中的值.iteritems():
如果键在freq1:
总[键] + =值
否则:
总[键] =值
返回总数

def add_freqs3(freq1,freq2):
total = dict(freq1)
for key,value in freq2.iteritems():
尝试:
total [key] + = value
除了KeyError:
总计[key] = value
返回总数

我的猜测是add_freqs3()表现最好。


谢谢Peter,这些都是非常好的想法。我迫不及待地想在今晚试一试。

这是关于你的功能的问题。如果我只看了
freq2中的键,那么我不会错过freq1中的任何键而不是freq2中的键吗?
这就是为什么我在原始函数中有两个循环的原因。



不,因为声明


total = dict(freq1)将total创建为freq1的浅表副本。因此

剩下要做的就是从freq2添加物品。


问候

史蒂夫

-

Steve Holden +44 150 684 7255 +1 800 494 3119

Holden Web LLC

PyCon TX 2006


Hey guys,

I thought I''d throw this out there since everyone loves optimization
questions (it''s true, check out the number of replies to those type of
questions!)

Right now I have a function shown below. I want to combine two
dictionaries and add the values together where-ever there is overlap.
I figured this function would be reasonably fast, but I have around 1
million keys in each dictionary (~80% overlap) and it just runs all
night.

So maybe my function is not O(n)? I really figured it was but maybe
dictionary lookups are O(nlogn)?

Anyway here is the function. Any ideas on doing this task a lot
faster would be appriciated.

Thanks!

<code apology=sorry if gmail kills my tabs>

def add_freqs(freq1,freq2):
"""Add two word freq dicts"""
newfreq={}
for key,value in freq1.items():
newfreq[key]=value+freq2.get(key,0)
for key,value in freq2.items():
newfreq[key]=value+freq1.get(key,0)
return newfreq

freq1={
''dog'':1,
''cat'':2,
''human'':3
}
freq2={
''perro'':1,
''gato'':1,
''human'':2
}
print add_freqs(freq1,freq2)

answer_I_want={
''dog'':1,
''cat'':2,
''perro'':1,
''gato'':1,
''human'':5
}
</code>
--
Gregory Pi?ero
Chief Innovation Officer
Blended Technologies
(www.blendedtechnologies.com)

解决方案



With items() you copy the whole dictionary into a list of tuples;
iteritems() just walks over the existing dictionary and creates one tuple
at a time.

With "80% overlap", you are looking up and setting four out of five values
twice in your for-loops.

Dump the symmetry and try one of these:

def add_freqs2(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
if key in freq1:
total[key] += value
else:
total[key] = value
return total

def add_freqs3(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
try:
total[key] += value
except KeyError:
total[key] = value
return total

My guess is that add_freqs3() will perform best.

Peter





With items() you copy the whole dictionary into a list of tuples;
iteritems() just walks over the existing dictionary and creates one tuple
at a time.

With "80% overlap", you are looking up and setting four out of five values
twice in your for-loops.

Dump the symmetry and try one of these:

def add_freqs2(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
if key in freq1:
total[key] += value
else:
total[key] = value
return total

def add_freqs3(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
try:
total[key] += value
except KeyError:
total[key] = value
return total

My guess is that add_freqs3() will perform best.

Peter
--
http://mail.python.org/mailman/listinfo/python-list


--
Gregory Pi?ero
Chief Innovation Officer
Blended Technologies
(www.blendedtechnologies.com)





With items() you copy the whole dictionary into a list of tuples;
iteritems() just walks over the existing dictionary and creates one tuple
at a time.

With "80% overlap", you are looking up and setting four out of five values
twice in your for-loops.

Dump the symmetry and try one of these:

def add_freqs2(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
if key in freq1:
total[key] += value
else:
total[key] = value
return total

def add_freqs3(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
try:
total[key] += value
except KeyError:
total[key] = value
return total

My guess is that add_freqs3() will perform best.


Thanks Peter, those are some really good ideas. I can''t wait to try
them out tonight.

Here''s a question about your functions. if I only look at the keys in
freq2 then won''t I miss any keys that are in freq1 and not in freq2?
That''s why I have the two loops in my original function.


No, because the statement

total = dict(freq1) creates total as a shallow copy of freq1. Thus
all that remains to be done is to add the items from freq2.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/


这篇关于优化与dict.update()类似的功能,但添加了常用值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-02 23:20