问题描述
我想提取的一组交易的关联规则有以下code火花斯卡拉:
VAL FPG =新FPGrowth()。setMinSupport(minSupport).setNumPartitions(10)
VAL模型= fpg.run(交易)
model.generateAssociationRules(minConfidence).collect()
但产品数量都超过10K所以提取的规则对所有组合计算前pressive而且我也不需要他们。所以我想只提取成对:
产品1 ==>产品2
产品1 ==>产品3
产品3 ==>产品1
和我不关心其他组合,如:
[产品1] ==> [产品2,产品3]
[产品3,产品1] ==>产品2
有没有办法做到这一点?
谢谢,
阿米尔
假设你的交易看起来或多或少是这样的:
VAL交易= sc.parallelize(SEQ(
阵列(一,B,E),
阵列(C,B,E,F),
阵列(一,B,C),
阵列(C,E,F),
阵列(D,E,F)
))
您可以尝试手动生成频繁项集和应用 AssociationRules
直接
进口org.apache.spark.mllib.fpm.AssociationRules
进口org.apache.spark.mllib.fpm.FPGrowth.FreqItemsetVAL freqItemsets =交易
.flatMap(XS =>
(xs.combinations(1)+ xs.combinations(2))图(X =>(x.toList,1升))。
)
.reduceByKey(_ + _)
.MAP {情况下(XS,CNT)=>新FreqItemset(xs.toArray,CNT)}VAL AR =新AssociationRules()
.setMinConfidence(0.8)VAL结果= ar.run(freqItemsets)
注:
- 不幸的是你必须支持人工处理过滤。它可以通过
freqItemsets
应用过滤器来完成 - 您应该考虑增加分区数之前
flatMap
-
如果
freqItemsets
是大要处理,你可以拆分freqItemsets
成几个步骤来模仿实际FP增长:- 生成1模式,并支持通过过滤
- 使用步骤1 只能频繁模式产生2-模式
I want to extract association rules for a set of transaction with following code Spark-Scala:
val fpg = new FPGrowth().setMinSupport(minSupport).setNumPartitions(10)
val model = fpg.run(transactions)
model.generateAssociationRules(minConfidence).collect()
however the number of products are more than 10K so extracting the rules for all combination is computationally expressive and also I do not need them all. So I want to extract only pair wise:
Product 1 ==> Product 2
Product 1 ==> Product 3
Product 3 ==> Product 1
and I do not care about other combination such as:
[Product 1] ==> [Product 2, Product 3]
[Product 3,Product 1] ==> Product 2
Is there any way to do that?
Thanks,Amir
Assuming your transactions look more or less like this:
val transactions = sc.parallelize(Seq(
Array("a", "b", "e"),
Array("c", "b", "e", "f"),
Array("a", "b", "c"),
Array("c", "e", "f"),
Array("d", "e", "f")
))
you can try to generate frequent itemsets manually and apply AssociationRules
directly:
import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
val freqItemsets = transactions
.flatMap(xs =>
(xs.combinations(1) ++ xs.combinations(2)).map(x => (x.toList, 1L))
)
.reduceByKey(_ + _)
.map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)}
val ar = new AssociationRules()
.setMinConfidence(0.8)
val results = ar.run(freqItemsets)
Notes:
- unfortunately you'll have to handle filtering by support manually. It can be done by applying filter on
freqItemsets
- you should consider increasing number of partitions before
flatMap
if
freqItemsets
is to large to be handled you can splitfreqItemsets
into few steps to mimic actual FP-growth:- generate 1-patterns and filter by support
- generate 2-patterns using only frequent patterns from step 1
这篇关于与频繁模式挖掘关联规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!