问题描述
请考虑以下数据框:
Country Provinces City Zone
1 Canada Newfondland St Johns A
2 Canada PEI Charlottetown B
3 Canada Nova Scotia Halifax C
4 Canada New Brunswick Fredericton D
5 Canada Quebec NA NA
6 Canada Quebec Quebec City NA
7 Canada Ontario Toronto A
8 Canada Ontario Ottawa B
9 Canada Manitoba Winnipeg C
10 Canada Saskatchewan Regina D
将有一种巧妙的方法将其转换为treeNetwork
兼容列表(来自networkD3
包),格式为:
Would there be a clever way to convert it to a treeNetwork
compatible list (from the networkD3
package) in the form of:
CanadaPC <- list(name = "Canada",
children = list(
list(name = "Newfoundland",
children = list(list(name = "St. John's",
children = list(list(name = "A"))))),
list(name = "PEI",
children = list(list(name = "Charlottetown",
children = list(list(name = "B"))))),
list(name = "Nova Scotia",
children = list(list(name = "Halifax",
children = list(list(name = "C"))))),
list(name = "New Brunswick",
children = list(list(name = "Fredericton",
children = list(list(name = "D"))))),
list(name = "Quebec",
children = list(list(name = "Quebec City"))),
list(name = "Ontario",
children = list(list(name = "Toronto",
children = list(list(name = "A"))),
list(name = "Ottawa",
children = list(list(name = "B"))))),
list(name = "Manitoba",
children = list(list(name = "Winnipeg",
children = list(list(name = "C"))))),
list(name = "Saskatchewan",
children = list(list(name = "Regina",
children = list(list(name = "D")))))))
为了绘制 Reingold-Tilford 树,该树将具有一组任意级别:
In order to plot a Reingold-Tilford tree that would have an arbitrary set of levels:
我尝试了几种次优的例程,包括for
循环的混乱组合,但是我无法以所需的格式获得它.
I have tried several sub-optimal routines including a messy combination of for
loops but I can't get this in the desired format.
理想情况下,该函数将进行缩放以便将第一列视为root
(起点),而其他列将是不同级别的子级.
Ideally, the function would scale in order to consider the first column as the root
(starting point) and the other columns would be different levels of children.
修改
一个类似的问题被问及同一主题,@ MrFlick提供了一个有趣的递归函数.原始数据帧具有一组固定的级别.我介绍了NA
来增加@MrFlick初始解决方案中未达到的另一种复杂程度(任意级别的集合).
A similar question was asked on the same topic and @MrFlick provided an interesting recursive function. The original data frame had a fixed set of levels. I introduced NA
s to add another level of complexity (arbitrary set of levels) that is not adressed in @MrFlick initial solution.
数据
structure(list(Country = c("Canada", "Canada", "Canada", "Canada",
"Canada", "Canada", "Canada", "Canada", "Canada", "Canada"),
Provinces = c("Newfondland", "PEI", "Nova Scotia", "New Brunswick",
"Quebec", "Quebec", "Ontario", "Ontario", "Manitoba", "Saskatchewan"
), City = c("St Johns", "Charlottetown", "Halifax", "Fredericton",
NA, "Quebec City", "Toronto", "Ottawa", "Winnipeg", "Regina"
), Zone = c("A", "B", "C", "D", NA, NA, "A", "B", "C",
"D")), class = "data.frame", row.names = c(NA, -10L), .Names = c("Country",
"Provinces", "City", "Zone"))
推荐答案
对于这种情况,更好的策略可能是递归split()
.首先,这是示例数据
A better strategy for this scenario may be a recursive split()
Here's such an implementation. First, here's the sample data
dd<-structure(list(Country = c("Canada", "Canada", "Canada", "Canada",
"Canada", "Canada", "Canada", "Canada", "Canada", "Canada"),
Provinces = c("Newfondland", "PEI", "Nova Scotia", "New Brunswick",
"Quebec", "Quebec", "Ontario", "Ontario", "Manitoba", "Saskatchewan"
), City = c("St Johns", "Charlottetown", "Halifax", "Fredericton",
NA, "Quebec City", "Toronto", "Ottawa", "Winnipeg", "Regina"
), Zone = c("A", "B", "C", "D", NA, NA, "A", "B", "C",
"D")), class = "data.frame", row.names = c(NA, -10L), .Names = c("Country",
"Provinces", "City", "Zone"))
请注意,'我已将"NA"
字符串替换为真实的NA
值.现在,该功能
note that' i've replaced the "NA"
strings with true NA
values. Now, the function
rsplit <- function(x) {
x <- x[!is.na(x[,1]),,drop=FALSE]
if(nrow(x)==0) return(NULL)
if(ncol(x)==1) return(lapply(x[,1], function(v) list(name=v)))
s <- split(x[,-1, drop=FALSE], x[,1])
unname(mapply(function(v,n) {if(!is.null(v)) list(name=n, children=v) else list(name=n)}, lapply(s, rsplit), names(s), SIMPLIFY=FALSE))
}
然后我们可以运行
rsplit(dd)
似乎可以处理测试数据.唯一的区别是子节点的排列顺序.
It seems to work with the test data. The only difference is the order in which the children nodes are arranged.
这篇关于将数据框转换为treeNetwork兼容列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!