本文介绍了比较地址的两个字符串时,如何获得百分比精度匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试比较两个名称和地址列表,以查找唯一数据.我可以轻松地提取出两个列表中所有完全相同的字符串,然后剩下的名称和地址是不同的,但可能是同一个人.即:

I am trying to compare two lists of names and addresses to see find unique data. I can easily extract out all those are are exactly the same string in both lists, then I am left with names and addresses that are different but may be the same people.ie:

列表1中的条目Smith J Ph234567 34 Smith st

entry in list 1 Smith J Ph234567 34 Smith st

进入列表2 Smith John Ph234567 34 Smith st

entry in list 2 Smith John Ph234567 34 Smith st

列表1 Smith J Ph234567 34 Smith Rd中的条目

entry in list 1 Smith J Ph234567 34 Smith Rd

进入列表2 Smith J Ph234567 34 Smith Road

entry in list 2 Smith J Ph234567 34 Smith Road

我想为看起来彼此相似(例如80%匹配)的条目添加标签.

I want to add a tag to entries that seem to be similar with each other like 80% match.

嵌套的Foreach循环不起作用,因为它们匹配每个单词或字母(取决于您如何将它与其他单词或字母一起写在字符串中.

Nested Foreach loops don't work as they match every word, or letter (depending how you write it in the string with every other word or letter.

For循环不能作为一项更改工作.J vrs John在更改后为每个条目创建错误.

For loops don't work as one change J vrs John creates errors for every entry after the change.

我正在vb.net中编写它,但也可以从C#进行翻译

I am writing it in vb.net but can also translate from C#

推荐答案

通常通过计算编辑距离在字符串之间.例如,从Levenshtein距离开始.

This kind of problem is generally solved by calculating the edit distance between the strings. Start with the Levenshtein distance for instance.

这将给您一个分数(将一个字符串转换为另一个字符串所需的编辑操作"数).要将其转换为百分比同一性,您需要通过较大字符串的长度(沿percent = (largerString.Length - editDistance) / largerString.Length的线表示)对它进行归一化.

This will give you a score (the number of "edit operations" necessary to transform one string into the other). To convert this into a percent identity you need to normalise it by the length of the larger string (something along the lines of percent = (largerString.Length - editDistance) / largerString.Length).

这篇关于比较地址的两个字符串时,如何获得百分比精度匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-19 00:54