将数据框转换为treeNetwork兼容列表

将数据框转换为treeNetwork兼容列表

本文介绍了将数据框转换为treeNetwork兼容列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下数据框:

   Country     Provinces          City Zone
1   Canada   Newfondland      St Johns    A
2   Canada           PEI Charlottetown    B
3   Canada   Nova Scotia       Halifax    C
4   Canada New Brunswick   Fredericton    D
5   Canada        Quebec            NA   NA
6   Canada        Quebec   Quebec City   NA
7   Canada       Ontario       Toronto    A
8   Canada       Ontario        Ottawa    B
9   Canada      Manitoba      Winnipeg    C
10  Canada  Saskatchewan        Regina    D

将有一种巧妙的方法将其转换为treeNetwork兼容列表(来自networkD3包),格式为:

Would there be a clever way to convert it to a treeNetwork compatible list (from the networkD3 package) in the form of:

CanadaPC <- list(name = "Canada",
                 children = list(
                   list(name = "Newfoundland",
                        children = list(list(name = "St. John's",
                                             children = list(list(name = "A"))))),
                   list(name = "PEI",
                        children = list(list(name = "Charlottetown",
                                             children = list(list(name = "B"))))),
                   list(name = "Nova Scotia",
                        children = list(list(name = "Halifax",
                                             children = list(list(name = "C"))))),
                   list(name = "New Brunswick",
                        children = list(list(name = "Fredericton",
                                             children = list(list(name = "D"))))),
                   list(name = "Quebec",
                        children = list(list(name = "Quebec City"))),
                   list(name = "Ontario",
                        children = list(list(name = "Toronto",
                                             children = list(list(name = "A"))),
                                        list(name = "Ottawa",
                                             children = list(list(name = "B"))))),
                   list(name = "Manitoba",
                        children = list(list(name = "Winnipeg",
                                             children = list(list(name = "C"))))),
                   list(name = "Saskatchewan",
                        children = list(list(name = "Regina",
                                             children = list(list(name = "D")))))))

为了绘制 Reingold-Tilford 树,该树将具有一组任意级别:

In order to plot a Reingold-Tilford tree that would have an arbitrary set of levels:

我尝试了几种次优的例程,包括for循环的混乱组合,但是我无法以所需的格式获得它.

I have tried several sub-optimal routines including a messy combination of for loops but I can't get this in the desired format.

理想情况下,该函数将进行缩放以便将第一列视为root(起点),而其他列将是不同级别的子级.

Ideally, the function would scale in order to consider the first column as the root (starting point) and the other columns would be different levels of children.

修改

一个类似的问题被问及同一主题,@ MrFlick提供了一个有趣的递归函数.原始数据帧具有一组固定的级别.我介绍了NA来增加@MrFlick初始解决方案中未达到的另一种复杂程度(任意级别的集合).

A similar question was asked on the same topic and @MrFlick provided an interesting recursive function. The original data frame had a fixed set of levels. I introduced NAs to add another level of complexity (arbitrary set of levels) that is not adressed in @MrFlick initial solution.

数据

structure(list(Country = c("Canada", "Canada", "Canada", "Canada",
"Canada", "Canada", "Canada", "Canada", "Canada", "Canada"),
    Provinces = c("Newfondland", "PEI", "Nova Scotia", "New Brunswick",
    "Quebec", "Quebec", "Ontario", "Ontario", "Manitoba", "Saskatchewan"
    ), City = c("St Johns", "Charlottetown", "Halifax", "Fredericton",
    NA, "Quebec City", "Toronto", "Ottawa", "Winnipeg", "Regina"
    ), Zone = c("A", "B", "C", "D", NA, NA, "A", "B", "C",
    "D")), class = "data.frame", row.names = c(NA, -10L), .Names = c("Country",
"Provinces", "City", "Zone"))

推荐答案

对于这种情况,更好的策略可能是递归split().首先,这是示例数据

A better strategy for this scenario may be a recursive split() Here's such an implementation. First, here's the sample data

dd<-structure(list(Country = c("Canada", "Canada", "Canada", "Canada",
"Canada", "Canada", "Canada", "Canada", "Canada", "Canada"),
    Provinces = c("Newfondland", "PEI", "Nova Scotia", "New Brunswick",
    "Quebec", "Quebec", "Ontario", "Ontario", "Manitoba", "Saskatchewan"
    ), City = c("St Johns", "Charlottetown", "Halifax", "Fredericton",
    NA, "Quebec City", "Toronto", "Ottawa", "Winnipeg", "Regina"
    ), Zone = c("A", "B", "C", "D", NA, NA, "A", "B", "C",
    "D")), class = "data.frame", row.names = c(NA, -10L), .Names = c("Country",
"Provinces", "City", "Zone"))

请注意,'我已将"NA"字符串替换为真实的NA值.现在,该功能

note that' i've replaced the "NA" strings with true NA values. Now, the function

rsplit <- function(x) {
    x <- x[!is.na(x[,1]),,drop=FALSE]
    if(nrow(x)==0) return(NULL)
    if(ncol(x)==1) return(lapply(x[,1], function(v) list(name=v)))
    s <- split(x[,-1, drop=FALSE], x[,1])
    unname(mapply(function(v,n) {if(!is.null(v)) list(name=n, children=v) else list(name=n)}, lapply(s, rsplit), names(s), SIMPLIFY=FALSE))
}

然后我们可以运行

rsplit(dd)

似乎可以处理测试数据.唯一的区别是子节点的排列顺序.

It seems to work with the test data. The only difference is the order in which the children nodes are arranged.

这篇关于将数据框转换为treeNetwork兼容列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 17:01