根据定义将单词映射到数字

本文介绍了根据定义将单词映射到数字的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

作为一个较大的项目的一部分，我需要阅读文本并将每个单词表示为一个数字.例如，如果程序读取每个男孩都应得到水果"，那么我会得到一个表，该表将" 每个 "转换为" 1742 "，" 好 "到" 977513 "等

As part of a larger project, I need to read in text and represent each word as a number. For example, if the program reads in "Every good boy deserves fruit", then I would get a table that converts 'every' to '1742', 'good' to '977513', etc.

现在，显然我可以使用哈希算法来获取这些数字.但是，如果具有相似含义的单词的数值彼此接近，则将使 good 变为" 6827 "会更有用.并且' 伟大 '变为' 6835 '，依此类推.

Now, obviously I can just use a hashing algorithm to get these numbers. However, it would be more useful if words with similar meanings had numerical values close to each other, so that 'good' becomes '6827' and 'great' becomes '6835', etc.

作为另一种选择，最好使用由多个数字组成的向量，而不是用简单的整数表示每个数字，例如( lexical_category ， tense ， classification ， specific_word )，其中 lexical_category 是名词/动词/形容词/等，时态是future/past/目前，分类定义了一系列通用主题，并且 specific_word 与上一段中的描述非常相似.

As another option, instead of a simple integer representing each number, it would be even better to have a vector made up of multiple numbers, eg (lexical_category, tense, classification, specific_word) where lexical_category is noun/verb/adjective/etc, tense is future/past/present, classification defines a wide set of general topics and specific_word is much the same as described in the previous paragraph.

是否存在这样的算法?如果没有，您能给我一些如何开始自我发展的提示吗?我用C ++编写代码.

Does any such an algorithm exist? If not, can you give me any tips on how to get started on developing one myself? I code in C++.

根据定义将单词映射到数字

问题描述

推荐答案