我有一个客户的数据框 puzzle
和他们拥有的项目类型。如果客户有多个项目,他可能会在列表中多次出现。
name type
m1 A
m10 A
m2 A
m9 A
m9 B
m4 B
m5 B
m1 C
m2 C
m3 C
m4 C
m5 C
m6 C
m7 C
m8 C
m1 D
m5 D
我想计算拥有“A”和“B”的人的百分比,依此类推。
基于上述输入,我如何使用 R 获得这样的输出:
A B C D TOTAL
A 1 0.25 0.5 0.25 4
B 0.33 1 0.67 0.33 3
C 0.25 0.25 1 0.25 8
D 0.5 0.5 1 1 2
非常感谢你的帮助!
这是一个漫长而手动的方法,没有任何循环或高级功能(但当然这在 R 中浪费了潜力):
项目 A 的示例:-
puzzleA <- subset(puzzle, type == 'A')
计算拥有 A 和 B 的客户:-
length(unique((merge(puzzleA, puzzleB, by = 'name'))$name))/length(unique(puzzleA$name)
数据
puzzle <- structure(list(name = c("m1", "m10", "m2", "m9", "m9", "m4",
"m5", "m1", "m2", "m3", "m4", "m5", "m6", "m7", "m8", "m1", "m5"
), type = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C",
"C", "C", "C", "C", "C", "D", "D")), .Names = c("name", "type"
), class = "data.frame", row.names = c(NA, -17L))
最佳答案
您还可以构建一组关联规则,例如:
library(arules)
trans <- as(lapply(split(puzzle[2], puzzle[1]), unlist, F, F), "transactions")
rules <- apriori(trans, parameter = list(support=0, minlen=2, maxlen=2, conf=0))
res <- data.frame(
lhs = labels(lhs(rules)),
rhs = labels(rhs(rules)),
value = round(rules@quality$confidence, 2)
)
res <- reshape2::dcast(res, lhs~rhs, fill = 1)
res$total <- rowSums(trans@data)
res
# lhs {A} {B} {C} {D} total
# 1 {A} 1.00 0.25 0.50 0.25 4
# 2 {B} 0.33 1.00 0.67 0.33 3
# 3 {C} 0.25 0.25 1.00 0.25 8
# 4 {D} 0.50 0.50 1.00 1.00 2
关于r - 获取每对的人数百分比,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/39556841/