本文介绍了R中的比例树形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我需要构建一个算法,给定由n个因子组成的 data.frame ,返回一个树形图,其中每个节点表示一个因子的级别,而按照该因子级别和上级节点级别划分的行比例(例如,每个节点可以显示:factorX.levelY = 30%)。 第一个节点将表示总行数,并将作为基数(100)。树的第二层将有k个节点,它们将对应于第一个因子的k个层次,第三层将有k * m个节点,其中m将是第二个因子的层次。等等。 用作函数输入的'data.frame'将按照可用作节点层次结构的方式对列进行排序。例如, data [,1] 将是树中的上层因子, data [,2] 和等等。 以下是一个用作输入的 data.frame 示例: df f2 = rep(字母[1:4],每个= 25), f3 = rep(colors(1)[1:2],25,each = 2)) 该图看起来像这样,但是前面指出的节点内的格式为:(factorX.levelY = 30%) 我注意到 rpart 包可以生成类似的图,但唯一接受的输入是模型对象类型。 解决方案这是一个递归方法。首先,有一个函数来构建树结构,将每个拆分级别的比例收集到一个命名的嵌套列表中。其次,有一个函数可以将嵌套列表转换为边界列表,以便与 igraph 一起使用。最后, igraph 提供绘图功能。 ##创建树结构在嵌套列表中 makePtree< - 函数(data,prev = 1){ tab 0] / nrow数据)* prev#计算当前级别的比例 ns if(NCOL(data)< 2L)return(ns)#我们完成了,只返回名字 setNames(mapply(makePtree,split(data [, - 1L,drop = F],data [,1L],drop = T), tab,SIMPLIFY = F),ns)#recurse } ##从嵌套列表创建边界列表对于igraph :: graph_from_data_frame lst2edge< - function(lst){ if(!is.list(lst))return(data.frame(a = character(0),b = character(0 )) do.call(rbind,c(lapply(names(lst),function(x){ if (!is.list(lst [[x]]))return(data.frame(a = x,b = lst [[x]])) data.frame(a = x,b = names( lst [[x]]))}),lapply(lst,lst2edge)))} ## Apply函数 lst dat dat ##制作一个igraph 库(igraph)g plot(g,layout = layout.reingold。 tilford(g,root =root)) 您可以调整绘图函数的 vertex.label.degree 参数的顶点标签的位置。 I need to build an algorithm that, given a data.frame made up of n factors, returns a tree graph where each node represents a level of a factor and the proportion of rows classified by the level of that factor and by the level of the upper nodes (for example, each node could display: factorX.levelY=30%).The first node will represent the total number of rows and will be the base (100). The second level of the tree will have k nodes that will correspond to the k levels of the first factor, the third level will have k*m nodes, where m will be the levels of the second factor. And so on.The 'data.frame' used as input for the function will have its columns ordered in a way that will serve as the hierarchy of the nodes. For instance, data[,1] will be the upper level factor in the tree, data[,2] and so on.Here's an example of the data.frame that would be used as input: df<-data.frame( f1=factor( rep( LETTERS[1:2], each=50)), f2=rep( letters[1:4], each=25), f3=rep( colors(1)[1:2], 25, each=2))The graph would look like these, but with the format inside the nodes indicated before: (factorX.levelY=30%)I've noticed that the rpart package can produce similar graphs, but the only input that functions accept is a model object type. 解决方案 Here is a recursive approach. First, there is a function to build the tree structure, gathering the proportions at each split level into a named, nested list. Second, there is a function to convert the nested list to an edgelist to use with igraph. Lastly, igraph provides the plotting capability.## Create tree structure in nested listmakePtree <- function(data, prev=1) { tab <- (t <- table(data[,1L]))[t>0] / nrow(data)*prev # calculate proportions at current level ns <- sprintf("%s.%s=%.2f", names(data)[1L], names(tab), unname(c(tab))) # names if (NCOL(data) < 2L) return( ns ) # we are done, return names only setNames(mapply(makePtree, split(data[,-1L,drop=F], data[,1L], drop=T), tab, SIMPLIFY = F), ns) # recurse}## Create edgelist from nested list for igraph::graph_from_data_framelst2edge <- function(lst) { if (!is.list(lst)) return( data.frame(a=character(0), b=character(0)) ) do.call(rbind, c(lapply(names(lst), function(x) { if (!is.list(lst[[x]])) return( data.frame(a=x, b=lst[[x]]) ) data.frame(a=x, b=names(lst[[x]])) }), lapply(lst, lst2edge)))}## Apply functionslst <- makePtree(df) # nested listdat <- lst2edge(lst) # edgelistdat <- rbind(dat, data.frame(a="root", b=names(lst))) # add a root node## Make an igraphlibrary(igraph)g <- graph_from_data_frame(dat)plot(g, layout=layout.reingold.tilford(g, root="root"))If you wanted the final nodes to be represented separately you could alter their names so igraph points to them separately. Here, I modified the lst2edge function to produce longer names for the final level. Then use some regex to shorten them for the final figure.## Create edgelist from nested list for igraph::graph_from_data_framelst2edge <- function(lst) { if (!is.list(lst)) return( data.frame(a=character(0), b=character(0)) ) do.call(rbind, c(lapply(names(lst), function(x) { if (!is.list(lst[[x]])) return( data.frame(a=x, b=paste0(x, lst[[x]])) ) data.frame(a=x, b=names(lst[[x]])) }), lapply(lst, lst2edge)))}## Apply functionslst <- makePtree(df) # nested listdat <- lst2edge(lst) # edgelistdat <- rbind(dat, data.frame(a="root", b=names(lst))) # add a root node## Make an igraphg <- graph_from_data_frame(dat)## Fix the names of the last level (they are lengthened in lst2edge## so igraph doesn't show multiple incoming arrows to single nodes)V(g)$name <- gsub(".*?([^\\.]+=[^=]+$)", "\\1", V(g)$name)plot(g, layout=layout.reingold.tilford(g, root="root"), vertex.label.dist=-0.1, vertex.label.degree=c(rep(pi/2, 7), rep(c(pi/2, 3*pi/2), 4)))You can adjust the position of the vertex labels with vertex.label.degree argument to the plotting function. 这篇关于R中的比例树形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-21 05:40