



phrase = 'This is a little house'
dictSyns = {'little':['small','tiny','little'],
listPhrases = ['This is a tiny house','This is a small cottage','This is a small building','I need advice']


if any('This'+' '+'is'+' '+'a'+x+' '+y == phrase for x in dictSyns['little'] for y in dictSyns['house']):
    print 'match'


您能给我一个建议如何做的吗,这样在这种情况下该方法返回['This is a tiny house','This is a small cottage']?


>>> getMatches(phrase, dictSyns, listPhrases)
['This is a tiny house','This is a small cottage']


import itertools

def new_phrases(phrase, syns):
    """Generate new phrases from a base phrase and synonyms."""
    words = [syns.get(word, [word]) for word in phrase.split(' ')]
    for t in itertools.product(*words):
        yield ' '.join(t)

def get_matches(phrase, syns, phrases):
    """Generate acceptable new phrases based on a whitelist."""
    phrases = set(phrases)
    for new_phrase in new_phrases(phrase, syns):
        if new_phrase in phrases:
            yield new_phrase


>>> [syns.get(word, [word]) for word in phrase.split(' ')]
[['This'], ['is'], ['a'], ['small', 'tiny', 'little'], ['cottage', 'house']]


  • 使用生成器更有效地处理大量组合(而不是一次构建整个列表);
  • 使用set进行有效的成员资格测试(O(1),列表中使用O(n));
  • 使用 itertools.product 来生成可能的组合基于synsphrase(您也可以使用 来实现);和
  • 样式指南合规性.


>>> list(get_matches(phrase, syns, phrases))
['This is a small cottage', 'This is a tiny house']


  • 如何处理字符(例如,应该如何处理"House of Commons")?
  • 标点符号如何?

I'm trying to make a method which can check whether a given phrase matches at least one item from list of phrases and returns them. Input is the phrase, a list of phrases and a dictionary of lists of synonyms. The point is to make it universal.

Here is the example:

phrase = 'This is a little house'
dictSyns = {'little':['small','tiny','little'],
listPhrases = ['This is a tiny house','This is a small cottage','This is a small building','I need advice']

I can create a code which can do that on this example which returns bool:

if any('This'+' '+'is'+' '+'a'+x+' '+y == phrase for x in dictSyns['little'] for y in dictSyns['house']):
    print 'match'

The first point is that I have to create the function which would be universal (depends on results). The second is that I want this function to returns list of matched phrases.

Can you give me an advice how to do that so the method returns ['This is a tiny house','This is a small cottage'] in this case?

The output would be like:

>>> getMatches(phrase, dictSyns, listPhrases)
['This is a tiny house','This is a small cottage']

I would approach this as follows:

import itertools

def new_phrases(phrase, syns):
    """Generate new phrases from a base phrase and synonyms."""
    words = [syns.get(word, [word]) for word in phrase.split(' ')]
    for t in itertools.product(*words):
        yield ' '.join(t)

def get_matches(phrase, syns, phrases):
    """Generate acceptable new phrases based on a whitelist."""
    phrases = set(phrases)
    for new_phrase in new_phrases(phrase, syns):
        if new_phrase in phrases:
            yield new_phrase

The root of the code is the assignment of words, in new_phrases, which transforms the phrase and syns into a more usable form, a list where each element is a list of the acceptable choices for that word:

>>> [syns.get(word, [word]) for word in phrase.split(' ')]
[['This'], ['is'], ['a'], ['small', 'tiny', 'little'], ['cottage', 'house']]

Note the following:

  • Use of generators to deal more efficiently with large numbers of combinations (not building the whole list at once);
  • Use of a set for efficient (O(1), vs. O(n) for a list) membership testing;
  • Use of itertools.product to generate the possible combinations of phrase based on the syns (you could also use itertools.ifilter in implementing this); and
  • Style guide compliance.

In use:

>>> list(get_matches(phrase, syns, phrases))
['This is a small cottage', 'This is a tiny house']

Things to think about:

  • What about the case of characters (e.g. how should "House of Commons" be treated)?
  • What about punctuation?


08-20 10:04