问题描述
列表中有很多(> 100,000)小写字符串,其中一个子集可能看起来像这样:
I have a lot (>100,000) lowercase strings in a list, where a subset might look like this:
str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]
我还有一个这样的字典(实际上,它的长度约为1000):
I further have a dict like this (in reality this is going to have a length of around ~1000):
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}
对于列表中包含dict的任何键的所有字符串,我想用相应的dict值替换整个字符串.因此,预期结果应为:
For all strings in the list which contain any of the dict's keys, I want to replace the entire string with the corresponding dict value. The expected result should thus be:
str_list = ["dk", "us", "nothing here"]
鉴于我拥有的字符串数和字典的长度,最有效的方法是什么?
What is the most efficient way to do this given the number of strings I have and the length of the dict?
其他信息:字符串中最多只能有一个dict键.
Extra info: There is never more than one dict key in a string.
推荐答案
假设:
lst = ["hello i am from denmark", "that was in the united states", "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}
您可以这样做:
res = [dict_x.get(next((k for k in dict_x if k in my_str), None), my_str) for my_str in lst]
返回:
print(res) # -> ['dk', 'us', 'nothing here']
关于此的最酷的东西(除了它是python-ninjas最喜欢的武器,又名 list-comprehension )是get
,其默认值为my_str
和next
,其中的<None
的c3>值触发上述默认值.
The cool thing about this (apart from it being a python-ninjas favorite weapon aka list-comprehension) is the get
with a default of my_str
and next
with a StopIteration
value of None
that triggers the above default.
这篇关于有效识别字符串的一部分是否在列表/字典键中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!