问题描述
我正在尝试一种方法,该方法可以检查给定短语是否与短语列表中的至少一项匹配并返回它们.输入是短语,短语列表和同义词列表的字典.关键是要使其通用.
这里是示例:
phrase = 'This is a little house'
dictSyns = {'little':['small','tiny','little'],
'house':['cottage','house']}
listPhrases = ['This is a tiny house','This is a small cottage','This is a small building','I need advice']
我可以创建一个代码,该代码可以在返回bool的示例中执行该操作:
if any('This'+' '+'is'+' '+'a'+x+' '+y == phrase for x in dictSyns['little'] for y in dictSyns['house']):
print 'match'
第一点是我必须创建通用的函数(取决于结果).第二个是我希望此函数返回匹配短语的列表.
您能给我一个建议如何做的吗,这样在这种情况下该方法返回['This is a tiny house','This is a small cottage']
?
输出如下:
>>> getMatches(phrase, dictSyns, listPhrases)
['This is a tiny house','This is a small cottage']
我将按以下方式进行处理:
import itertools
def new_phrases(phrase, syns):
"""Generate new phrases from a base phrase and synonyms."""
words = [syns.get(word, [word]) for word in phrase.split(' ')]
for t in itertools.product(*words):
yield ' '.join(t)
def get_matches(phrase, syns, phrases):
"""Generate acceptable new phrases based on a whitelist."""
phrases = set(phrases)
for new_phrase in new_phrases(phrase, syns):
if new_phrase in phrases:
yield new_phrase
代码的根源是new_phrases
中的words
赋值,该赋值将phrase
和syns
转换为更可用的形式,其中每个元素都是可接受的选项列表.这个词:
>>> [syns.get(word, [word]) for word in phrase.split(' ')]
[['This'], ['is'], ['a'], ['small', 'tiny', 'little'], ['cottage', 'house']]
请注意以下几点:
- 使用生成器更有效地处理大量组合(而不是一次构建整个列表);
- 使用
set
进行有效的成员资格测试(O(1)
,列表中使用O(n)
); - 使用
itertools.product
来生成可能的组合基于syns
的phrase
(您也可以使用 来实现);和 - 样式指南合规性.
使用中:
>>> list(get_matches(phrase, syns, phrases))
['This is a small cottage', 'This is a tiny house']
要考虑的事情:
- 如何处理字符(例如,应该如何处理
"House of Commons"
)? - 标点符号如何?
I'm trying to make a method which can check whether a given phrase matches at least one item from list of phrases and returns them. Input is the phrase, a list of phrases and a dictionary of lists of synonyms. The point is to make it universal.
Here is the example:
phrase = 'This is a little house'
dictSyns = {'little':['small','tiny','little'],
'house':['cottage','house']}
listPhrases = ['This is a tiny house','This is a small cottage','This is a small building','I need advice']
I can create a code which can do that on this example which returns bool:
if any('This'+' '+'is'+' '+'a'+x+' '+y == phrase for x in dictSyns['little'] for y in dictSyns['house']):
print 'match'
The first point is that I have to create the function which would be universal (depends on results). The second is that I want this function to returns list of matched phrases.
Can you give me an advice how to do that so the method returns ['This is a tiny house','This is a small cottage']
in this case?
The output would be like:
>>> getMatches(phrase, dictSyns, listPhrases)
['This is a tiny house','This is a small cottage']
I would approach this as follows:
import itertools
def new_phrases(phrase, syns):
"""Generate new phrases from a base phrase and synonyms."""
words = [syns.get(word, [word]) for word in phrase.split(' ')]
for t in itertools.product(*words):
yield ' '.join(t)
def get_matches(phrase, syns, phrases):
"""Generate acceptable new phrases based on a whitelist."""
phrases = set(phrases)
for new_phrase in new_phrases(phrase, syns):
if new_phrase in phrases:
yield new_phrase
The root of the code is the assignment of words
, in new_phrases
, which transforms the phrase
and syns
into a more usable form, a list where each element is a list of the acceptable choices for that word:
>>> [syns.get(word, [word]) for word in phrase.split(' ')]
[['This'], ['is'], ['a'], ['small', 'tiny', 'little'], ['cottage', 'house']]
Note the following:
- Use of generators to deal more efficiently with large numbers of combinations (not building the whole list at once);
- Use of a
set
for efficient (O(1)
, vs.O(n)
for a list) membership testing; - Use of
itertools.product
to generate the possible combinations ofphrase
based on thesyns
(you could also useitertools.ifilter
in implementing this); and - Style guide compliance.
In use:
>>> list(get_matches(phrase, syns, phrases))
['This is a small cottage', 'This is a tiny house']
Things to think about:
- What about the case of characters (e.g. how should
"House of Commons"
be treated)? - What about punctuation?
这篇关于按给定短语返回匹配项列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!