问题描述
嘿伙计们,
我以为我会把它扔出去,因为每个人都喜欢优化
问题(这是真的,结账对这类
问题的回复数量!)
现在我有一个如下所示的功能。我想结合两个
词典,并将值加在一起,哪里有重叠。
我认为这个功能会相当快,但我在每个字典中都有大约1
百万的键(约80%重叠)并且它只运行所有
晚上。
所以也许我的功能不是O(n)?我真的认为它可能是
字典查找是O(nlogn)?
无论如何这里是函数。任何关于完成这项任务的想法很快就会被贬值。
谢谢!
< code apology =抱歉,如果gmail杀死我的标签>
def add_freqs(freq1,freq2):
"""添加两个单词freq dicts" ;"
newfreq = {}
为密钥,值为freq1.items():
newfreq [key] = value + freq2 .get(key,0)
表示密钥,值为freq2.items():
newfreq [key] = value + freq1.get(key,0)
返回newfreq
freq1 = {
''dog'':1,
''猫'':2,
''人类'':3
}
freq2 = {
' 'perro'':1,
''gato'':1,
''human'':2
}
print add_freqs(freq1,freq2)
answer_I_want = {
''dog'':1,
''猫'':2,
''perro'':1,
''gato'':1,
' '人'':5
}
< / code>
-
Gregory Pi?ero
首席创新官
混合技术
()
-
Gregory Pi?ero
首席创新官
混合技术
( )
使用items()您将整个字典复制到元组列表;
iteritems()只是遍历现有字典并创建一个元组
一次。
80%重叠,你正在查找并在你的for循环中设置两个五个值中的四个。
转储对称并尝试以下方法之一:
def add_freqs2(freq1,freq2):
total = dict(freq1)
for key,freq2中的值.iteritems():
如果键在freq1:
总[键] + =值
否则:
总[键] =值
返回总数
def add_freqs3(freq1,freq2):
total = dict(freq1)
for key,value in freq2.iteritems():
尝试:
total [key] + = value
除了KeyError:
总计[key] = value
返回总数
我的猜测是add_freqs3()表现最好。
谢谢Peter,这些都是非常好的想法。我迫不及待地想在今晚试一试。
这是关于你的功能的问题。如果我只看了
freq2中的键,那么我不会错过freq1中的任何键而不是freq2中的键吗?
这就是为什么我在原始函数中有两个循环的原因。
不,因为声明
total = dict(freq1)将total创建为freq1的浅表副本。因此
剩下要做的就是从freq2添加物品。
问候
史蒂夫
-
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC
PyCon TX 2006
Hey guys,
I thought I''d throw this out there since everyone loves optimization
questions (it''s true, check out the number of replies to those type of
questions!)
Right now I have a function shown below. I want to combine two
dictionaries and add the values together where-ever there is overlap.
I figured this function would be reasonably fast, but I have around 1
million keys in each dictionary (~80% overlap) and it just runs all
night.
So maybe my function is not O(n)? I really figured it was but maybe
dictionary lookups are O(nlogn)?
Anyway here is the function. Any ideas on doing this task a lot
faster would be appriciated.
Thanks!
<code apology=sorry if gmail kills my tabs>
def add_freqs(freq1,freq2):
"""Add two word freq dicts"""
newfreq={}
for key,value in freq1.items():
newfreq[key]=value+freq2.get(key,0)
for key,value in freq2.items():
newfreq[key]=value+freq1.get(key,0)
return newfreq
freq1={
''dog'':1,
''cat'':2,
''human'':3
}
freq2={
''perro'':1,
''gato'':1,
''human'':2
}
print add_freqs(freq1,freq2)
answer_I_want={
''dog'':1,
''cat'':2,
''perro'':1,
''gato'':1,
''human'':5
}
</code>
--
Gregory Pi?ero
Chief Innovation Officer
Blended Technologies
(www.blendedtechnologies.com)
With items() you copy the whole dictionary into a list of tuples;
iteritems() just walks over the existing dictionary and creates one tuple
at a time.
With "80% overlap", you are looking up and setting four out of five values
twice in your for-loops.
Dump the symmetry and try one of these:
def add_freqs2(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
if key in freq1:
total[key] += value
else:
total[key] = value
return total
def add_freqs3(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
try:
total[key] += value
except KeyError:
total[key] = value
return total
My guess is that add_freqs3() will perform best.
Peter
With items() you copy the whole dictionary into a list of tuples;
iteritems() just walks over the existing dictionary and creates one tuple
at a time.
With "80% overlap", you are looking up and setting four out of five values
twice in your for-loops.
Dump the symmetry and try one of these:
def add_freqs2(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
if key in freq1:
total[key] += value
else:
total[key] = value
return total
def add_freqs3(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
try:
total[key] += value
except KeyError:
total[key] = value
return total
My guess is that add_freqs3() will perform best.
Peter
--
http://mail.python.org/mailman/listinfo/python-list
--
Gregory Pi?ero
Chief Innovation Officer
Blended Technologies
(www.blendedtechnologies.com)
With items() you copy the whole dictionary into a list of tuples;
iteritems() just walks over the existing dictionary and creates one tuple
at a time.
With "80% overlap", you are looking up and setting four out of five values
twice in your for-loops.
Dump the symmetry and try one of these:
def add_freqs2(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
if key in freq1:
total[key] += value
else:
total[key] = value
return total
def add_freqs3(freq1, freq2):
total = dict(freq1)
for key, value in freq2.iteritems():
try:
total[key] += value
except KeyError:
total[key] = value
return total
My guess is that add_freqs3() will perform best.
Thanks Peter, those are some really good ideas. I can''t wait to try
them out tonight.
Here''s a question about your functions. if I only look at the keys in
freq2 then won''t I miss any keys that are in freq1 and not in freq2?
That''s why I have the two loops in my original function.
No, because the statement
total = dict(freq1) creates total as a shallow copy of freq1. Thus
all that remains to be done is to add the items from freq2.
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/
这篇关于优化与dict.update()类似的功能,但添加了常用值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!