如何清理用户输入

如何清理用户输入

本文介绍了Oracle Text:如何清理用户输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果任何人有使用Oracle文本的经验( CTXSYS.CONTEXT ),我想知道如何处理用户输入,当用户想要搜索可能包含撇号似乎在某些情况下有效,但不适用于单词末尾的 - s在停用词列表中,所以似乎已被删除。



我们目前将简单的查询文本(即任何只是字母的)更改为%text% ,例如:

  contains(field,:text)> 0   

搜索 O'Neil 有效,但 方案

用反斜杠转义所有特殊字符。大括号在子字符串搜索中不起作用,因为它们定义了完整的令牌。例如,%{ello}%不会与标记'Hello'匹配



转义空格字符将包含在搜索标记中,因此搜索字符串'%stay \ near \ me%'将被视为文字字符串stay near me,并且不会调用'near'运算符。



如果您将索引短字符串像名称等),并且您希望Oracle Text的行为与like运算符完全相同,则必须编写自己的词法分析器,该词法分析器不会为单个词创建令牌。 (不幸的是,CATSEARCH不支持子字符串搜索......)

将搜索改为使用oracle文本的语义和令牌匹配可能是一个好主意,但对于某些应用程序中,多个(短)令牌和数字令牌的通配符扩展会为用户合理期望的搜索字符串创建太多匹配。

例如,如果索引数据中有很多数字标记,搜索%I \ AM\ NUMBER\ 9%很可能会失败因为所有以'I'结尾且以'9'开头的令牌都必须在返回结果之前进行搜索和合并。



'我'和'AM'可能也在默认的停止列表中,并且将被完全忽略,所以对于这个假设的应用程序,如果这些标记可能会使用空的列表很重要。


If anyone has experience using Oracle text (CTXSYS.CONTEXT), I'm wondering how to handle user input when the user wants to search for names that may contain an apostrophe.

Escaping the ' seems to work in some cases, but not for 's at the end of the word - s is in the list of stop words, and so seems to get removed.

We currently change simple query text (i.e. anything that's just letters) to %text%, for example:

contains(field, :text) > 0

A search for O'Neil works, but Joe's doesn't.

Has anyone using Oracle Text dealt with this issue?

解决方案

Escape all special characters with backslashes. Curly braces won't work with substring searches as they define complete tokens. Eg %{ello}% won't match the token 'Hello'

Escaped space characters will be included in the search token, so the search string '%stay\ near\ me%' will be treated as a literal string "stay near me" and will not invoke the 'near' operator.

If you are indexing short strings (like names, etc ) and you want Oracle Text to behave exactly as the like operator, you must write your own lexer that won't create tokens for individual words. (Unfortunately CATSEARCH does not support substring search...)

It is probably a good idea to change the searches to use oracle text's semantics, with token matching, but for some applications, the wildcard expansion of multiple (short) tokens and numeric tokens will create too many hits for search strings that the users reasonably would expect to work.

Eg, a search for "%I\ AM\ NUMBER\ 9%" will most likely fail if there are a lot of numeric tokens in the indexed data, since all tokens ending with 'I' and starting with '9' must be searched and merged before the result can be returned.

'I' and 'AM' is probably also in the default stoplist and will be totally ignored, so for this hypothetical application, a null stoplist may be used if these tokens are important.

这篇关于Oracle Text:如何清理用户输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 05:08