与频繁模式挖掘关联规则

本文介绍了与频繁模式挖掘关联规则的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想提取的一组交易的关联规则有以下code火花斯卡拉：

  VAL FPG =新FPGrowth（）。setMinSupport（minSupport）.setNumPartitions（10）
VAL模型= fpg.run（交易）
model.generateAssociationRules（minConfidence）.collect（）

但产品数量都超过10K所以提取的规则对所有组合计算前pressive而且我也不需要他们。所以我想只提取成对：

 产品1 ==＆GT;产品2
产品1 ==＆GT;产品3
产品3 ==＆GT;产品1

和我不关心其他组合，如：

  [产品1] ==＆GT; [产品2，产品3]
[产品3，产品1] ==＆GT;产品2

有没有办法做到这一点？

谢谢，
阿米尔

解决方案

假设你的交易看起来或多或少是这样的：

  VAL交易= sc.parallelize（SEQ（
  阵列（一，B，E），
  阵列（C，B，E，F），
  阵列（一，B，C），
  阵列（C，E，F），
  阵列（D，E，F）
））

您可以尝试手动生成频繁项集和应用 AssociationRules 直接

 进口org.apache.spark.mllib.fpm.AssociationRules
进口org.apache.spark.mllib.fpm.FPGrowth.FreqItemsetVAL freqItemsets =交易
  .flatMap（XS =＆GT;
    （xs.combinations（1）+ xs.combinations（2））图（X =＆GT;（x.toList，1升））。
  ）
  .reduceByKey（_ + _）
  .MAP {情况下（XS，CNT）=＆GT;新FreqItemset（xs.toArray，CNT）}VAL AR =新AssociationRules（）
  .setMinConfidence（0.8）VAL结果= ar.run（freqItemsets）

注：

不幸的是你必须支持人工处理过滤。它可以通过 freqItemsets 应用过滤器来完成

您应该考虑增加分区数之前 flatMap

如果 freqItemsets 是大要处理，你可以拆分 freqItemsets 成几个步骤来模仿实际FP增长：
1. 生成1模式，并支持通过过滤
2. 使用步骤1

I want to extract association rules for a set of transaction with following code Spark-Scala:

val fpg = new FPGrowth().setMinSupport(minSupport).setNumPartitions(10)
val model = fpg.run(transactions)
model.generateAssociationRules(minConfidence).collect()

however the number of products are more than 10K so extracting the rules for all combination is computationally expressive and also I do not need them all. So I want to extract only pair wise:

Product 1 ==> Product 2
Product 1 ==> Product 3
Product 3 ==> Product 1

and I do not care about other combination such as:

[Product 1] ==> [Product 2, Product 3]
[Product 3,Product 1] ==> Product 2

Is there any way to do that?

Thanks,Amir

解决方案

Assuming your transactions look more or less like this:

val transactions = sc.parallelize(Seq(
  Array("a", "b", "e"),
  Array("c", "b", "e", "f"),
  Array("a", "b", "c"),
  Array("c", "e", "f"),
  Array("d", "e", "f")
))

you can try to generate frequent itemsets manually and apply AssociationRules directly:

import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset

val freqItemsets = transactions
  .flatMap(xs =>
    (xs.combinations(1) ++ xs.combinations(2)).map(x => (x.toList, 1L))
  )
  .reduceByKey(_ + _)
  .map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)}

val ar = new AssociationRules()
  .setMinConfidence(0.8)

val results = ar.run(freqItemsets)

Notes:

unfortunately you'll have to handle filtering by support manually. It can be done by applying filter on freqItemsets
you should consider increasing number of partitions before flatMap
if freqItemsets is to large to be handled you can split freqItemsets into few steps to mimic actual FP-growth:
1. generate 1-patterns and filter by support
2. generate 2-patterns using only frequent patterns from step 1

这篇关于与频繁模式挖掘关联规则的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

filter

与频繁模式挖掘关联规则

问题描述