问题描述
我需要开发一种用于Web日志数据挖掘的工具。
在特定用户会话中有很多url序列(从Web应用程序日志中检索) ,我需要弄清楚网站用户的使用方式和用户组(集群)。
我是Data Mining的新手,现在对Google进行了大量研究。
找到了一些有用的信息,即查询似乎指向几乎完全相似的研究。 p>
所以我的问题是:
- 是否有任何基于Python的工具在做什么?我需要或至少需要类似的东西吗?
- 可以有什么帮助吗?
- 对Google来说,读什么,用哪种相对简单的算法最合适?
我的时间非常有限(大约一周),因此任何帮助都是极其宝贵的。我需要的是向我指出正确的方向,并提供有关如何在最短时间内完成任务的建议。
预先感谢!
1& 2:Orange具有频繁的模式挖掘模块。它还支持群集。
3。我刚刚检查了书的内容。没有关于频繁模式挖掘的章节。无论如何,对于数据挖掘的初学者来说,这通常是一本好书。您会发现它非常有助于准确地定义问题。
4。您需要了解聚类的输入和输出,频繁模式挖掘/关联规则挖掘。因此,谷歌使用这些算法,或者找到一本好的数据挖掘教科书来阅读。
I need to develop a tool for web log data mining.
Having many sequences of urls, requested in a particular user session (retrieved from web-application logs), I need to figure out the patterns of usage and groups (clusters) of users of the website.
I am new to Data Mining, and now examining Google a lot.Found some useful info, i.e. querying Frequent Pattern Mining in Web Log Data seems to point to almost exactly similar studies.
So my questions are:
- Are there any python-based tools that do what I need or at least smth similar?
- Can Orange toolkit be of any help?
- Can reading the book Programming Collective Intelligence be of any help?
- What to Google for, what to read, which relatively simple algorithms to use best?
I am very limited in time (to around a week), so any help would be extremely precious. What I need is to point me into the right direction and the advice of how to accomplish the task in the shortest time.
Thanks in advance!
1&2: Orange has a frequent pattern mining module. It also supports clustering.
3.I have just check the content of the book. There is not a chapter for frequent pattern mining. Anyway, it is generally a good book for beginners in data mining. You will find it very useful to help you define your problem precisely.
4.You need to understand the input and output of clustering, frequent pattern mining/association rule mining. So google these algorithms, or find a good data mining text book to read.
这篇关于Python,Web日志数据挖掘以获取频繁模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!