问题描述
因此,我想根据一起购买的物品并根据eclat的Wiki查找样式和簇:
So I want to find patterns and "clusters" based on what items that are bought together, and according to the wiki for eclat:
但是,当我在R中使用eclat时,通过tidLists检索结果时,将获得零频繁项和 NULL。有人可以看到我在做什么错吗?
Though, when I use the eclat in R, i get "zero frequent items" and "NULL" when when retrieving the results through tidLists. Anyone can see what I am doing wrong?
完整的数据集: https://pastebin.com/8GbjnHK2
每行都是一个事务,在各列中包含不同的项目。数据快速快照:
Each row is a transactions, containing different items in the columns. Quick snap of the data:
3060615;;;;;;;;;;;;;;;
3060612;3060616;;;;;;;;;;;;;;
3020703;;;;;;;;;;;;;;;
3002469;;;;;;;;;;;;;;;
3062800;;;;;;;;;;;;;;;
3061943;3061965;;;;;;;;;;;;;;
代码
trans = read.transactions("Transactions.csv", format = "basket", sep = ";")
f <- eclat(trans, parameter = list(supp = 0.1, maxlen = 17, tidLists = TRUE))
dim(tidLists(f))
as(tidLists(f), "list")
是否由于数据结构?在这种情况下,我该如何更改?此外,我该怎么做才能获得建议的项目集?我无法从Wiki上了解这一点。
Could it be due to the data structure? In that case, how should I change it? Furthermore, what do I do to get the suggested itemsets? I couldn't figure that out from the wiki.
编辑:我使用0.004作为补充,如@ hpesoj626所建议。但是似乎该功能正在对订单/用户而不是物品进行分组。我不知道如何导出数据,所以这是tidLists的图片:
I used 0.004 for supp, as suggested by @hpesoj626. But it seems like the function is grouping the orders/users and not the items. I don't know how to export the data, so here is a picture of the tidLists:
推荐答案
问题是您设置的支持过高。尝试调整 supp
,例如 supp = .001
,我们得到
The problem is that you have set your support too high. Try adjusting supp
say, supp = .001
, for which we get
dim(tidLists(f))
# [1] 928 15840
对于您的数据集,最高支持是0.08239,低于0.1。这就是为什么 supp = 0.1
不会获得结果的原因。
For your data set, the highest support is 0.08239 which is below 0.1. That is why you are getting no results with supp = 0.1
.
inspect(head(sort(f, by = "support"), 10))
# items support count
# [1] {3060620} 0.08239 1305
# [2] {3060619} 0.07260 1150
# [3] {3061124} 0.05688 901
# [4] {3060618} 0.05663 897
# [5] {4027039} 0.04975 788
# [6] {3060617} 0.04564 723
# [7] {3061697} 0.04306 682
# [8] {3060619,3060620} 0.03087 489
# [9] {3039715} 0.02727 432
# [10] {3045117} 0.02708 429
这篇关于“零频物品”使用eclat挖掘频繁项集时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!