我有一个客户的数据框 puzzle 和他们拥有的项目类型。如果客户有多个项目,他可能会在列表中多次出现。

name    type
m1       A
m10      A
m2       A
m9       A
m9       B
m4       B
m5       B
m1       C
m2       C
m3       C
m4       C
m5       C
m6       C
m7       C
m8       C
m1       D
m5       D

我想计算拥有“A”和“B”的人的百分比,依此类推。

基于上述输入,我如何使用 R 获得这样的输出:
    A     B      C      D      TOTAL
A   1     0.25   0.5    0.25    4
B   0.33  1      0.67   0.33    3
C   0.25  0.25   1      0.25    8
D   0.5   0.5    1      1       2

非常感谢你的帮助!

这是一个漫长而手动的方法,没有任何循环或高级功能(但当然这在 R 中浪费了潜力):

项目 A 的示例:-
puzzleA <- subset(puzzle, type == 'A')

计算拥有 A 和 B 的客户:-
length(unique((merge(puzzleA, puzzleB, by = 'name'))$name))/length(unique(puzzleA$name)

数据
puzzle <- structure(list(name = c("m1", "m10", "m2", "m9", "m9", "m4",
          "m5", "m1", "m2", "m3", "m4", "m5", "m6", "m7", "m8", "m1", "m5"
          ), type = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C",
          "C", "C", "C", "C", "C", "D", "D")), .Names = c("name", "type"
          ), class = "data.frame", row.names = c(NA, -17L))

最佳答案

您还可以构建一组关联规则,例如:

library(arules)
trans <- as(lapply(split(puzzle[2], puzzle[1]), unlist, F, F), "transactions")
rules <- apriori(trans, parameter = list(support=0, minlen=2, maxlen=2, conf=0))
res <- data.frame(
  lhs = labels(lhs(rules)),
  rhs = labels(rhs(rules)),
  value = round(rules@quality$confidence, 2)
)
res <- reshape2::dcast(res, lhs~rhs, fill = 1)
res$total <- rowSums(trans@data)
res
#   lhs  {A}  {B}  {C}  {D} total
# 1 {A} 1.00 0.25 0.50 0.25     4
# 2 {B} 0.33 1.00 0.67 0.33     3
# 3 {C} 0.25 0.25 1.00 0.25     8
# 4 {D} 0.50 0.50 1.00 1.00     2

关于r - 获取每对的人数百分比,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/39556841/

10-12 22:30