问题描述
我想了解的先验(篮)算法的基本原理进行数据挖掘,
I'm trying to understand the fundamentals of the Apriori (Basket) Algorithm for use in data mining,
它是最好的,我解释了并发症我遇到一个例子:
It's best I explain the complication i'm having with an example:
下面是事务性的数据集:
Here is a transactional dataset:
t1: Milk, Chicken, Beer
t2: Chicken, Cheese
t3: Cheese, Boots
t4: Cheese, Chicken, Beer
t5: Chicken, Beer, Clothes, Cheese, Milk
t6: Clothes, Beer, Milk
t7: Beer, Milk, Clothes
在最小支持度的上面是0.5或50%。
考虑从上面,我的交易数量显然是7 ,意为一个项集是频繁,它必须有 4/7 的一个数。因此,这是我的频繁项集1:
Taking from the above, my number of transactions is clearly 7, meaning for an itemset to be "frequent" it must have a count of 4/7. As such this was my Frequent itemset 1:
F1:
Milk = 4
Chicken = 4
Beer = 5
Cheese = 4
然后,我创造了我的候选人第二细化(C2)和它缩小:
I then created my candidates for the second refinement (C2) and narrowed it down to:
F2:
{Milk, Beer} = 4
这是我感到困惑,如果我被要求显示的所有的频繁项集我记下所有 F1的
和 F2
或只是 F2
? F1
来我不是套。
This is where I get confused, if I am asked to display all frequent itemsets do I write down all of F1
and F2
or just F2
? F1
to me aren't "sets".
我接着问到创建我刚才定义的频繁项集关联规则,并计算他们的信心的数字,我得到这样的:
I am then asked to create association rules for the frequent itemsets I have just defined and calculate their "confidence" figures, I get this:
Milk -> Beer = 100% confidence
Beer -> Milk = 80% confidence
这似乎是多余的放 F1
的项目集在这里,因为他们都将有100%的信心,无论而实际上并不准什么,这是我现在质疑 F1
是否确实经常?
It seems superfluous to put F1
's itemsets in here as they will all have a confidence of 100% regardless and don't actually "associate" anything, which is the reason I am now questioning whether F1
are indeed "frequent"?
推荐答案
与1认为频繁的,如果他们的支持是合适的尺寸项目集。 但是在这里你必须考虑在最低门槛。就像如果您的最低门槛在你的例子是 2 和 F1
将不予考虑。但是,如果在最低门槛是 1 ,然后你不得不这样做。
Itemsets with size of 1 considered frequent if their support is suitable. But here you have to consider the minimal threshold. like if your minimal threshold in your example is 2 then F1
will not be considered. But if the minimal threshold is 1 then you have to.
您可以看看这里和的更多的想法和例子。
you can take a look here and here for more ideas and examples.
希望我帮助。
这篇关于频繁项集和放大器;关联规则 - Apriori算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!