本文介绍了程序搜索不恰当的语言的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 有没有人知道或者知道我在哪里可以获得一些代码来检查 TextBox是否有不合适的语言。 目前,我们需要在 发布之前手动检查语言提交。这需要花费大量的时间和资源,并且在某些情况下需要大量的时间才能实际提交。 谢谢, TomDoes anyone have or know where I can get a some code that will check aTextBox for inappropriate language.At the moment, we need to manually check submissions for language beforeposting. This takes a lot of time and resources and in some cases a lot oftime before a submission is actually made live.Thanks,Tom推荐答案 对于它的价值,您可能需要重新考虑将此作为自动化的 流程。正如尼古拉斯指出的那样,你可以使用各种文本搜索 机制来匹配不恰当语言的字典。反对 提交。然而,这可能会导致过于激进, 阻止文本在某些情况下完全正常,或者过于被动, 允许人们轻松绕过过滤器,或者同时兼顾两个问题 ,阻止在 的同时阻止的事情,同时允许进攻性的事情太容易(通常在 用户故意混淆他们的冒犯性语言 使他们的文字对人类显而易见而没有计算机能够 理解它)。 当处理人类独有的问题时,通常最好将解决方案留给人类。你可以投入大量的时间和 努力创建一个基于字典的文本匹配系统,试图用b $ b过滤不合适的语言,或者你可以放一点点 ;报告帖子 链接在用户的查看用户界面并自动阻止帖子(甚至可能是 甚至是用户)的某个阈值(可能基于总比例 用户群)用户将帖子报告为不合适。 使用这种机制,相对少数用户仍然会受到不适当的影响语言,但希望它不是真的那么有害于他们,最终结果将是不恰当的语言更准确地识别和阻止。也就是说,即使你保证一些用户总是会在任何 帖子中看到不合适的语言,平均而言所有用户都可能会看到不那么不合适的语言比完全自动化的系统更好。 这就是说,如果你决定去字典路线,你可能会发现 简单的正则表达式或IndexOf,正如尼古拉斯建议的那样表现不佳。如果 提交的内容很短,而且字典中只有少量的 字,那可能就好了。但除此之外,你可能会发现 算法成本失控,因为提交长度和 字典长度变大。 如果是这样,您可能需要考虑基于现有索引的内容 和/或拼写检查功能。我承认,我不熟悉 那里已经存在的东西。我猜我已经有了很好的,功能齐全的 库(甚至可能是我所知道的.NET中的类),可以处理那些工作。但是,如果没有,你可能会发现我写这个类作为 练习,对类似的问题有用: < http://groups.google.com/group /microsoft.public.dotnet.languages.csharp/msg/0f06f696d4500b77?dmode=source> 该帖子中的原始海报从未提及是否找到它 有用与否。也许他没有,也许你也不会。但是无论如何,我还是要提起它,以防万一。 :) PeteFor what it''s worth, you may want to reconsider making this an automatedprocess. You could, as Nicholas points out, use various text searchingmechanisms to match a dictionary of "inappropriate language" againstsubmissions. However, this runs the risk of either being too aggressive,blocking text that is in some contexts perfectly fine, or too passive,allowing people to easily bypass the filter, or even having both problemsat the same time, blocking things that shouldn''t be blocked while at thesame time allowing offensive things through far too easily (usually whenthe user intentionally obfuscates their offensive language in a way thatmakes their text obvious to a human without a computer being able tounderstand it).When dealing with problems that are unique to humans, it is usually bestto leave the solution to humans. You can either invest a lot of time andeffort into creating a dictionary-based text matching system that tries tofilter inappropriate language, or you can just put a little "report post"link in the user''s viewing UI and automatically block posts (and maybeeven users) when some threshold (probably based on proportion of totaluser base) of users reports the post as inappropriate.Using such a mechanism, a relative handful of users will still besubjected to inappropriate language, but hopefully it''s not really thatharmful to them, and the end result will be that inappropriate language ismuch more accurately identified and blocked. That is, even though you''reguaranteed some users will always see the inappropriate language in anypost, on average all users are likely to see less inappropriate languagethan would be the case with a completely automated system.That said, if you do decide to go the dictionary route, you may find thatsimple Regex or IndexOf as Nicholas suggested doesn''t perform well. Ifthe submissions are short and the dictionary only has a small number ofwords in it, that''s probably fine. But otherwise, you are likely to findthat the algorithm cost scales out of control as submission length anddictionary length get large.If so, you may want to consider something based on existing indexingand/or spell-check functionality. I admit, I''m not that familiar withwhat''s already out there. I''d guess there are already good, full-featuredlibraries (maybe even classes in .NET for all I know) that can handle thatsort of work. However, if not you may find this class that I wrote as anexercise for a similar problem useful:<http://groups.google.com/group/microsoft.public.dotnet.languages.csharp/msg/0f06f696d4500b77?dmode=source>The original poster in that thread never mentioned whether he found ituseful or not. Maybe he didn''t, and maybe you wouldn''t either. But Imention it anyway, just in case. :)Pete 这篇关于程序搜索不恰当的语言的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!