问题描述
我想了解用户的serarh期限。想到有人正在寻找钉在纽约 - 我想知道,它的一个位置搜索,其中关键字是主食和地点是纽约。同样,如果有人类型猫帽子,解析器不应该标志,也作为一个位置搜索,这里的整个关键字是猫帽子。是否有任何算法或开源库可用来分析一个搜索词,并了解其比较(如A对B),或者它是一个基于位置的搜索(如在X)?
I would like to understand the serarh term of a user. Think of someone is searching for "staples in NY" - I would like to understand that its a location search where keyword is staples and location is new york. Similarly if someone types "cat in hat", the parser should not flag that also as a location search, here the entire keyword is "cat in hat".Is there any algorithm or open source library available to parse a search term and understand its a comparison (like A vs B) or its a location based search (like A in X)?
推荐答案
您所描述的问题称为的。一系列的算法存在,最简单的幸福正则表达式匹配,最好的结构化的机器学习。第一次尝试正则表达式,并期待在像 NLTK 如果你知道了Python。
The problem you describe is called information extraction. A host of algorithms exist, the simplest being regexp matching, the best structured machine learning. Try regexps first and look at something like NLTK if you know Python.
这是猫帽子分裂钉在纽约是可能的,如果你的程序知道NY是一个位置。您可以通过首都还是因为纽约出现在列表中称为地名告诉一>
Distinguishing "staples in NY" from "cat in hat" is possible if your program knows that "NY" is a location. You can tell either by the capitals or because "NY" occurs in a list called a gazetteer.
在一般的问题是 AI-完整的,所以我们期待投入大量的艰苦的工作,如果你想很好的效果。
The problem in general is AI-complete, so expect to put in lots of hard work if you want good results.
这篇关于解析器解析搜索词并提取有价值的信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!