问题描述
我正在尝试在R中找到一个包,在其中我可以找到数据集中超过给定阈值的聚类.
I'm trying to find a package in R where I can find clusters that exceed a given threshold in a dataset.
我想知道的是群集持续时间/大小以及每个群集的各个值.
What I want to know is the the cluster duration/size and the individual values of each cluster.
例如(一个简单的例子):
For example (a simple one):
我有一个数据向量,
10 8 6 14 14 7 14 5 11 12 8 11 11 16 20 6 8 8 6 15
10 8 6 14 14 7 14 5 11 12 8 11 11 16 20 6 8 8 6 15
大于 9 的集群以粗体定义
10 8 6 14 14 7 14 5 11 12 8 11 11 16 20 6 8 8 6 15
10 8 6 14 14 7 14 5 11 12 8 11 11 16 20 6 8 8 6 15
所以这里的簇大小是
1,2,1,2,4,1
1, 2, 1, 2, 4, 1
我想要R做的是将簇以单独的有序向量返回,例如
What I want R to do is return the clusters in separate ordered vectors, e.g.
[1] 10
[2] 14 14
[3] 14
[4] 11 12
[5] 11 11 16 20
[6] 15
是否有这样的程序包或一段带有if语句的代码也会有所帮助.
Is there such a package or also a piece of code with if statements for example would also help.
欢呼
推荐答案
data.table :: rleid
函数对此效果很好:
Filter(function(a) a[1] > 9, split(vec, data.table::rleid(vec > 9)))
# $`1`
# [1] 10
# $`3`
# [1] 14 14
# $`5`
# [1] 14
# $`7`
# [1] 11 12
# $`9`
# [1] 11 11 16 20
# $`11`
# [1] 15
如果您不想仅为此加载 data.table
包,则可以使用:
If you'd prefer to not load the data.table
package just for that, then a base-R approach from https://stackoverflow.com/a/33509966:
myrleid <- function(x) {
rl <- rle(x)$lengths
rep(seq_along(rl), times = rl)
}
Filter(function(a) a[1] > 9, split(vec, myrleid(vec > 9)))
这篇关于在R中找到合适的软件包进行聚类分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!