本文介绍了将hcluster生成的ndarray转换为Newick字符串以与ete2包一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个通过运行创建的向量列表:

I have a list of vectors created by running:

import hcluster
import numpy as np
from ete2 import Tree

vecs = [np.array(i) for i in document_list] 

其中document_list是我正在分析的Web文档的集合.然后,我执行分层聚类:

where document_list is a collection of web documents I am analysing. I then perform hierarchical clustering:

Z = hcluster.linkage(vecs, metric='cosine') 

这将生成一个ndarray,例如:

This generates an ndarray such as:

[[ 12.          19.           0.           1.        ]
[ 15.          21.           0.           3.        ]
[ 18.          22.           0.           4.        ]
[  3.          16.           0.           7.        ]
[  8.          23.           0.           6.        ]
[  5.          27.           0.           6.        ]
[  1.          28.           0.           7.        ]
[  0.          21.           0.           2.        ]
[  5.          29.           0.18350472   2.        ]
[  2.          10.           0.18350472   3.        ]
[ 47.          30.           0.29289577   9.        ]
[ 13.          28.           0.29289577  13.        ]
[ 73.          32.           0.29289577  18.        ]
[ 26.          12.           0.42264521   5.        ]
[  5.          33.           0.42264521  12.        ]
[ 14.          35.           0.42264521  12.        ]
[ 19.          35.           0.42264521  18.        ]
[  4.          20.           0.31174826   3.        ]
[ 34.          21.           0.5         19.        ]
[ 38.          29.           0.31174826  21.        ]]

是否可以将此ndarray转换为可以传递给ete2 Tree()构造函数的newick字符串,以便我可以使用ete2提供的工具来绘制和操纵newick树?

Is it possible to convert this ndarray into a newick string that can be passed to the ete2 Tree() constructor so that I can draw and manipulate a newick tree using the tools provided by ete2?

尝试这样做是否有意义,如果没有,我还有另一种方法可以使用相同的数据和ete2生成树/树状图(我意识到还有其他软件包可以绘制树状图,例如dendropy和hcluster本身,但还是希望全部使用ete2)?

Does it even make sense to try and do this and if not is there another way that I can generate a tree/dendrogram using the same data and ete2 (I realise that there are other packages that can draw dendrograms such as dendropy and hcluster itself but would prefer to use ete2 all the same)?

谢谢!

推荐答案

我将以下方法用于几乎相同的事情:

I use the following approach for pretty much the same thing:

from hcluster import linkage, to_tree
from ete2 import Tree

#hcluster part
Y = dist_matrix(items, dist_fn)
Z = linkage(Y, "single")
T = to_tree(Z)

#ete2 section
root = Tree()
root.dist = 0
root.name = "root"
item2node = {T: root}

to_visit = [T]
while to_visit:
    node = to_visit.pop()
    cl_dist = node.dist /2.0
    for ch_node in [node.left, node.right]:
        if ch_node:
            ch = Tree()
            ch.dist = cl_dist
            ch.name = str(ch_node.id)
            item2node[node].add_child(ch)
            item2node[ch_node] = ch
            to_visit.append(ch_node)

# This is your ETE tree structure
tree = root

这篇关于将hcluster生成的ndarray转换为Newick字符串以与ete2包一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-21 11:52