本文介绍了R中的Arules序列挖掘的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望在R中使用arulesSequences包.但是,我不知道如何将数据帧强制转换为可以利用此包的对象.

I am looking to use the arulesSequences package in R. However, I have no idea as to how to coerce my data frame into an object that can leverage this package.

这是一个玩具数据集,可复制我的数据结构:

Here is a toy dataset that replicates my data structure:

ids <- c(rep("X", 5), rep("Y", 5), rep("Z", 5))
seq <- rep(1:5,3)
val <- sample(LETTERS, 15, replace=T)
df <- data.frame(ids, seq, val)
df

   ids seq val
1    X   1   T
2    X   2   H
3    X   3   V
4    X   4   A
5    X   5   X
6    Y   1   D
7    Y   2   B
8    Y   3   A
9    Y   4   D
10   Y   5   P
11   Z   1   Q
12   Z   2   R
13   Z   3   W
14   Z   4   W
15   Z   5   P

任何帮助将不胜感激.

推荐答案

对我来说,它实际上是添加一个订单"列,该列列出了订单排名而不是时间值.您只需要在命名约定中非常具体.尝试命名组"或订购的篮子#"变量sequenceID,然后调用排名或订购的eventID.

It worked for me add an essentially "order" column that lists a order ranking rather than a time value. You just have to be very specific in the naming convention. Try and name the "group" or "ordered basket #" variable sequenceID, and call the ranking or ordering eventID.

另一个帮助我(并且让我ing了好久的头)的东西是read_baskets()似乎需要我指定

Another thing that helped me (and had me scratching my head for a long time) was that read_baskets() seemed to need me to specify

read_baskets(con  = filePath.txt, sep = " ", info = c("sequenceID","eventID","SIZE"))

尽管help函数使c()详细信息看起来像是可选的标头,但事实并非如此.我似乎需要从文件中删除标头,并在read_baskets()命令中指定它,否则会遇到问题.

Even though the help function makes the c() details seem like an optional header, it is not. I seemed to need to remove the header from my file and specify it in the read_baskets() command, or I'd run into problems.

这篇关于R中的Arules序列挖掘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 10:43