Weka简介
Weka的全名是怀卡托智能分析环境(Waikato Environment for Knowledge Analysis),是一款免费的,非商业化(与之对应的是SPSS公司商业数据挖掘产品--Clementine )的,基于JAVA环境下开源的机器学习(machine learning)以及数据挖掘(data minining)软件。
Weka数据格式
Mysql简介
MySQL是一个关系型数据库管理系统,由瑞典MySQL AB公司开发,目前属于Oracle公司。MySQL是一种关联数据库管理系统,关联数据库将数据保存在不同的表中,而不是将所有数据放在一个大仓库内,这样就增加了速度并提高了灵活性。MySQL所使用的SQL语言是用于访问数据库的最常用标准化语言。MySQL软件采用了双授权政策(本词条“授权政策”),它分为社区版和商业版,由于其体积小、速度快、总体拥有成本低,尤其是开放源码这一特点,一般中小型网站的开发都选择MySQL作为网站数据库。由于其社区版的性能卓越,搭配PHP和Apache可组成良好的开发环境。
Weka直接连接Mysql
由于Weka数据格式的特殊性,如果想在Weka中处理数据,必须首先将数据的格式转化成ARFF格式,所以需要经历SQL->ARFF的转化,比较麻烦,但是Weka已经为此做了充分的准备,只需简单配置就可在Weka GUI上直接连接操作Mysql数据库。
准备工作:
Java运行环境
Weka安装
mysql-connector-java-5.1.26-bin.jar
详细配置步骤:
在weka的安装目录下新建lib文件夹,将mysql-connector-java-5.1.26-bin.jar包复制到此lib文件夹下,并且在%JAVA_HOME%\jre\lib\ext"下也复制一份mysql-connector-java-5.1.6-bin.jar。
在weka的安装目录下找到weka.jar,将其解压到当前目录,你会看到多出来一个名为weka的文件夹,进到此文件夹目录下,找到experiment文件夹下的DatabaseUtils.props.mysql,将其改名为DatabaseUtils.props,替换原有的DatabaseUtils.props文件,并将其修改文件里的以下内容:
# Database settings for MySQL 3.23.x, 4.x
#
# General information on database access can be found here:
# http://weka.wikispaces.com/Databases
#
# url: http://www.mysql.com/
# jdbc: http://www.mysql.com/products/connector/j/
# author: Fracpete (fracpete at waikato dot ac dot nz)
# version: $Revision: 5836 $ # JDBC driver (comma-separated list)
#jdbcDriver=org.gjt.mm.mysql.Driver
jdbcDriver=com.mysql.jdbc.Driver # database URL
#jdbcURL=jdbc:mysql://server_name:3306/database_name
jdbcURL=jdbc:mysql://localhost:3306/rtest
# specific data types
# string, getString() = 0; --> nominal
# boolean, getBoolean() = 1; --> nominal
# double, getDouble() = 2; --> numeric
# byte, getByte() = 3; --> numeric
# short, getByte()= 4; --> numeric
# int, getInteger() = 5; --> numeric
# long, getLong() = 6; --> numeric
# float, getFloat() = 7; --> numeric
# date, getDate() = 8; --> date
# text, getString() = 9; --> string
# time, getTime() = 10; --> date # specific data types
string, getString() = 0; --> nominal
boolean, getBoolean() = 1; --> nominal
double, getDouble() = 2; --> numeric
byte, getByte() = 3; --> numeric
short, getByte()= 4; --> numeric
int, getInteger() = 5; --> numeric
long, getLong() = 6; --> numeric
float, getFloat() = 7; --> numeric
date, getDate() = 8; --> date
text, getString() = 9; --> string
time, getTime() = 10; --> date
TINYINT=3
SMALLINT=4
#SHORT=4
SHORT=5
INTEGER=5
INT=5
INT_UNSIGNED=6
BIGINT=6
LONG=6
REAL=7
NUMERIC=2
DECIMAL=2
FLOAT=2
DOUBLE=2
CHAR=0
TEXT=0
VARCHAR=0
LONGVARCHAR=9
BINARY=0
VARBINARY=0
LONGVARBINARY=9
BIT=1
BLOB=9
DATE=8
TIME=8
DATETIME=8
TIMESTAMP=8 # other options
CREATE_DOUBLE=DOUBLE
CREATE_STRING=TEXT
CREATE_INT=INT
CREATE_DATE=DATETIME
DateFormat=yyyy-MM-dd HH:mm:ss
checkUpperCaseNames=false
checkLowerCaseNames=false
checkForTable=true # All the reserved keywords for this database
# Based on the keywords listed at the following URL (2009-04-13):
# http://dev.mysql.com/doc/mysqld-version-reference/en/mysqld-version-reference-reservedwords-5-0.html
Keywords=\
ADD,\
ALL,\
ALTER,\
ANALYZE,\
AND,\
AS,\
ASC,\
ASENSITIVE,\
BEFORE,\
BETWEEN,\
BIGINT,\
BINARY,\
BLOB,\
BOTH,\
BY,\
CALL,\
CASCADE,\
CASE,\
CHANGE,\
CHAR,\
CHARACTER,\
CHECK,\
COLLATE,\
COLUMN,\
COLUMNS,\
CONDITION,\
CONNECTION,\
CONSTRAINT,\
CONTINUE,\
CONVERT,\
CREATE,\
CROSS,\
CURRENT_DATE,\
CURRENT_TIME,\
CURRENT_TIMESTAMP,\
CURRENT_USER,\
CURSOR,\
DATABASE,\
DATABASES,\
DAY_HOUR,\
DAY_MICROSECOND,\
DAY_MINUTE,\
DAY_SECOND,\
DEC,\
DECIMAL,\
DECLARE,\
DEFAULT,\
DELAYED,\
DELETE,\
DESC,\
DESCRIBE,\
DETERMINISTIC,\
DISTINCT,\
DISTINCTROW,\
DIV,\
DOUBLE,\
DROP,\
DUAL,\
EACH,\
ELSE,\
ELSEIF,\
ENCLOSED,\
ESCAPED,\
EXISTS,\
EXIT,\
EXPLAIN,\
FALSE,\
FETCH,\
FIELDS,\
FLOAT,\
FLOAT4,\
FLOAT8,\
FOR,\
FORCE,\
FOREIGN,\
FROM,\
FULLTEXT,\
GOTO,\
GRANT,\
GROUP,\
HAVING,\
HIGH_PRIORITY,\
HOUR_MICROSECOND,\
HOUR_MINUTE,\
HOUR_SECOND,\
IF,\
IGNORE,\
IN,\
INDEX,\
INFILE,\
INNER,\
INOUT,\
INSENSITIVE,\
INSERT,\
INT,\
INT1,\
INT2,\
INT3,\
INT4,\
INT8,\
INTEGER,\
INTERVAL,\
INTO,\
IS,\
ITERATE,\
JOIN,\
KEY,\
KEYS,\
KILL,\
LABEL,\
LEADING,\
LEAVE,\
LEFT,\
LIKE,\
LIMIT,\
LINES,\
LOAD,\
LOCALTIME,\
LOCALTIMESTAMP,\
LOCK,\
LONG,\
LONGBLOB,\
LONGTEXT,\
LOOP,\
LOW_PRIORITY,\
MATCH,\
MEDIUMBLOB,\
MEDIUMINT,\
MEDIUMTEXT,\
MIDDLEINT,\
MINUTE_MICROSECOND,\
MINUTE_SECOND,\
MOD,\
MODIFIES,\
NATURAL,\
NOT,\
NO_WRITE_TO_BINLOG,\
NULL,\
NUMERIC,\
ON,\
OPTIMIZE,\
OPTION,\
OPTIONALLY,\
OR,\
ORDER,\
OUT,\
OUTER,\
OUTFILE,\
PRECISION,\
PRIMARY,\
PRIVILEGES,\
PROCEDURE,\
PURGE,\
READ,\
READS,\
REAL,\
REFERENCES,\
REGEXP,\
RELEASE,\
RENAME,\
REPEAT,\
REPLACE,\
REQUIRE,\
RESTRICT,\
RETURN,\
REVOKE,\
RIGHT,\
RLIKE,\
SCHEMA,\
SCHEMAS,\
SECOND_MICROSECOND,\
SELECT,\
SENSITIVE,\
SEPARATOR,\
SET,\
SHOW,\
SMALLINT,\
SONAME,\
SPATIAL,\
SPECIFIC,\
SQL,\
SQLEXCEPTION,\
SQLSTATE,\
SQLWARNING,\
SQL_BIG_RESULT,\
SQL_CALC_FOUND_ROWS,\
SQL_SMALL_RESULT,\
SSL,\
STARTING,\
STRAIGHT_JOIN,\
TABLE,\
TABLES,\
TERMINATED,\
THEN,\
TINYBLOB,\
TINYINT,\
TINYTEXT,\
TO,\
TRAILING,\
TRIGGER,\
TRUE,\
UNDO,\
UNION,\
UNIQUE,\
UNLOCK,\
UNSIGNED,\
UPDATE,\
UPGRADE,\
USAGE,\
USE,\
USING,\
UTC_DATE,\
UTC_TIME,\
UTC_TIMESTAMP,\
VALUES,\
VARBINARY,\
VARCHAR,\
VARCHARACTER,\
VARYING,\
WHEN,\
WHERE,\
WHILE,\
WITH,\
WRITE,\
XOR,\
YEAR_MONTH,\
ZEROFILL # The character to append to attribute names to avoid exceptions due to
# clashes between keywords and attribute names
KeywordsMaskChar=_ #flags for loading and saving instances using DatabaseLoader/Saver
nominalToStringLimit=50
idColumn=auto_generated_id
然后将weka文件夹打包成weka.jar,替换原来的weka.jar。运行weka,选择open DB,选择user,输入用户名和密码,点击connect,info显示connecting to:jdbc:mysql://localhost:3306/myweka = true,代表连接成功。Explorer就从数据库中载入数据集了。