问题描述
我的工作我的研究RNA结构演变Python项目(psented作为一个字符串,例如重新$ P $:(((?))),其中括号再present个碱基对)。该点的存在是我有一个理想的结构和演变朝着理想的结构人口。我已经实现的事情,但是我想补充一个功能,我可以在每一代获得了数桶,即第k最重presentative结构在人群中。
I am working on a python project where I study RNA structure evolution (represented as a string for example: "(((...)))" where the parenthesis represent basepairs). The point being is that I have an ideal structure and a population that evolves towards the ideal structure. I have implemented everything however I would like to add a feature where I can get the "number of buckets" ie the k most representative structures in the population at each generation.
我想用K-means算法,但我不知道如何使用它的字符串。我发现 scipy.cluster.vq ,但我不知道如何使用它我的情况。
I was thinking of using the k-means algorithm but I am not sure how to use it with strings. I found scipy.cluster.vq but I don't know how to use it in my case.
谢谢!
推荐答案
K-手段并不真正关心涉及的数据类型。所有你需要做K均值一些方法来衡量一个距离,从一个项目到另一个。它会做如何的恰好是从基础数据计算出来的东西的基础上的距离无关。
K-means doesn't really care about the type of the data involved. All you need to do a K-means is some way to measure a "distance" from one item to another. It'll do its thing based on the distances, regardless of how that happens to be computed from the underlying data.
这是说,我没有使用 scipy.cluster.vq
,所以我不知道你到底是这么讲的项目之间的关系,或者如何计算从项目A的距离为B项。
That said, I haven't used scipy.cluster.vq
, so I'm not sure exactly how you tell it the relationship between items, or how to compute a distance from item A to item B.
这篇关于我可以使用K-means算法的一个字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!