将任意字符串映射到RGB值

将任意字符串映射到RGB值

本文介绍了将任意字符串映射到RGB值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一套庞大的任意自然语言字符串。对于我的工具来分析他们,我需要将每个字符串转换为唯一的颜色值(RGB或其他)。我需要颜色对比取决于字符串的相似性(更多的字符串不同于其他,他们各自的颜色应该是不同的越多)。



有关如何处理这个问题的任何建议吗?



字符串之间的距离更新



我可能需要将相似性定义为类似于Levenstein的距离。不需要自然语言解析。



这是:

 我要去商店 and 
我们要去商店

类似。

 我要去商店和
我今天要去商店

 <$ c $> c>我要去商店和
J bn hpjoh up uif tupsf

很相似。



(感谢,!)



我可能会知道将看到程序输出。



任务简化更新



我删除了我自己的建议拆分任务变成两个绝对距离计算和颜色分布。这不会很好,因为我们首先将尺寸信息缩小到一个维度,然后尝试将其合成为三个维度。

解决方案

您需要详细说明类似字符串的含义,以便提供适当的转换函数。是字符串

 我要去商店和
我们要去商店

被视为类似?关于字符串

 我要去商店和
J bn hpjoh up uif tupsf

(原始+1中的所有字母),或

 我要去商店和
我今天要去商店
pre>

?基于类似的意思,你可以考虑不同的函数。



如果差异只能基于字符的值(在Unicode或任何空间他们来自),那么您可以尝试将值向上求和并使用结果作为HSV空间的色调。如果有一个较长的字符串会导致颜色更加不同,你可以考虑通过字符串中的位置来对字符进行加权。



如果差异更复杂,比如通过某些字母或单词的出现,那么你需要识别这一点。也许你可以根据字符串中的Es,Ss和Rs的数量来决定红色,绿色和蓝色值,如果你的域有很多这些。



有许多不同的方法来处理这个问题,但最好的一个真的取决于对类似字符串的意思。


I have a huge set of arbitrary natural language strings. For my tool to analyze them I need to convert each string to unique color value (RGB or other). I need color contrast to depend on string similarity (the more string is different from other, the more their respective colors should be different). Would be perfect if I would always get same color value for the same string.

Any advice on how to approach this problem?

Update on distance between strings

I probably need "similarity" defined as a Levenstein-like distance. No natural language parsing is required.

That is:

"I am going to the store" and
"We are going to the store"

Similar.

"I am going to the store" and
"I am going to the store today"

Similar as well (but slightly less).

"I am going to the store" and
"J bn hpjoh up uif tupsf"

Quite not similar.

(Thanks, Welbog!)

I probably would know exactly what distance function I need only when I'll see program output. So lets start from simpler things.

Update on task simplification

I've removed my own suggestion to split task into two — absolute distance calculation and color distribution. This would not work well as at first we're reducing dimensional information to a single dimension, and then trying to synthesize it up to three dimensions.

解决方案

You need to elaborate more on what you mean by "similar strings" in order to come up with an appropriate conversion function. Are the strings

 "I am going to the store" and
"We are going to the store"

considered similar? What about the strings

 "I am going to the store" and
"J bn hpjoh up uif tupsf"

(all of the letters in the original +1), or

 "I am going to the store" and
"I am going to the store today"

? Based on what you mean by "similar", you might consider different functions.

If the difference can be based solely on the values of the characters (in Unicode or whatever space they are from), then you can try summing the values up and using the result as a hue for HSV space. If having a longer string should cause the colours to be more different, you might consider weighing characters by their position in the string.

If the difference is more complex, such as by the occurrences of certain letters or words, then you need to identify this. Maybe you can decide red, green and blue values based on the number of Es, Ss and Rs in a string, if your domain has a lot of these. Or pick a hue based on the ratio of vowels to consonents, or words to syllables.

There are many, many different ways to approach this, but the best one really depends on what you mean by "similar" strings.

这篇关于将任意字符串映射到RGB值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 07:15