根据相似性比较字符串

根据相似性比较字符串

我有两个列表,它们看起来像这样

<List> ads
[0]
Headline = "Sony Ericsson Arc silver"
[1]
Headline = "Sony Ericsson Play R800I"


<List> feedItems
[0]
Headline = "Sony Ericsson Xperia Arc Silver"
[1]
Headline = "Sony Ericsson Xperia Play R800i Black"


创建一个新的第三个列表且元素之间至少匹配两个单词的最简单方法是什么?您能以LINQ方式完成此操作吗?

第三个列表如下所示

[0]
AdHeadline = "Sony Ericsson Arc silver"
MatchingFeed  = "Sony Ericsson Xperia Arc Silver"
// etc


我尝试遍历第一个列表并使用Regex.Match类,如果找到匹配项,则填充第三个列表-我想知道您这样做的首选方式,以及如何检查最小值。表达式中有2个以上的单词。

最佳答案

我不确定正则表达式会给这里的聚会带来什么影响。接下来呢?

// Define a helper function to split a string into its words.
Func<string, HashSet<string>> GetWords = s =>
    new HashSet<string>(
        s.Split(new[]{' '}, StringSplitOptions.RemoveEmptyEntries)
        );

// Pair up each string with its words. Materialize the second one as
// we'll be querying it multiple times.
var aPairs = ads.Select(a => new { Full = a, Words = GetWords(a) });
var fPairs = feedItems
                 .Select(f => new { Full = f, Words = GetWords(f) })
                 .ToArray();

// For each ad, select all the feeds that match more than one word.
// Then just select the original ad and feed strings.
var result = aPairs.SelectMany(
    a => fPairs
        .Where(f => a.Words.Intersect(f.Words).Skip(1).Any())
        .Select(f => new { AdHeadline = a.Full, MatchingFeed = f.Full })
    );

关于c# - 根据相似性比较字符串,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/11134789/

10-12 03:42