问题描述
如何在数据库中找到重复的地址,或者在填写表单时更好地停止用户?我想早些时候更好?
How do I find duplicate addresses in a database, or better stop people already when filling in the form ? I guess the earlier the better?
有没有什么好的方法抽象街道,邮政编码等,以便拼写错误和简单的尝试,可以检测到2注册?例如:
Is there any good way of abstracting street, postal code etc so that typos and simple attempts to get 2 registrations can be detected? like:
Quellenstrasse 66/11
Quellenstr. 66a-11
我说的是德国地址...
谢谢! >
I'm talking German addresses...Thanks!
推荐答案
当我们以前处理这种类型的项目时,使用我们现有的地址(150k左右),然后对我们的域应用最常见的变换(爱尔兰,所以Dr - >驱动器,Rd - >道路等)。恐怕没有全面的在线资源,这样的事情,当时,所以我们最终基本上提出了一个清单,我们自己,检查的东西,像电话簿(压缩空间,地址以各种方式缩写! )。正如我前面提到的,你会惊讶你会发现有多少重复,只添加了一些常用规则!
When we were working on this type of project before, our approach was to take our existing corpus of addresses (150k or so), then apply the most common transformations for our domain (Ireland, so "Dr"->"Drive", "Rd"->"Road", etc). I'm afraid there was no comprehensive online resource for such things at the time, so we ended up basically coming up with a list ourselves, checking things like the phone book (pressed for space there, addresses are abbreviated in all manner of ways!). As I mentioned earlier, you'd be amazed how many "duplicates" you'll detect with the addition of only a few common rules!
我最近偶然发现一个包含相当全面的的网页,虽然它是美国英语,因此我不知道在德国有多有用!一个快速谷歌翻了几个网站,但他们似乎是垃圾邮件通讯注册陷阱。虽然这是我用谷歌英语,所以你可以看看更多的德语德国地址缩写:)
I've recently stumbled across a page with a fairly comprehensive list of address abbreviations, although it's american english, so I'm not sure how useful it'd be in Germany! A quick google turned up a couple of sites, but they seemed like spammy newsletter sign-up traps. Although that was me googling in english, so you may have more look with "german address abbreviations" in german :)
这篇关于查找数据库中的重复地址,阻止用户提前输入它们?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!