1 - Iris数据集
Iris数据集是常用的机器学习分类实验数据集,特点是数据量很小,可以快速学习。
数据集包含150个数据集,分为3类,每类50个数据,每个数据包含4个属性。
- Sepal.Length(花萼长度),单位是cm
- Sepal.Width(花萼宽度),单位是cm
- Petal.Length(花瓣长度),单位是cm
- Petal.Width(花瓣宽度),单位是cm
可通过以上4个属性预测鸢尾花卉属于以下三个种类中的哪一类
- Iris Setosa(山鸢尾)
- Iris Versicolour(杂色鸢尾)
- Iris Virginica(维吉尼亚鸢尾)
2 - 在Python中运行Iris数据集的深度学习
2.1 - 代码内容
# coding=utf-8
import h2o
h2o.init() # 默认情况下,H2O实例允许使用所有内核, 并且通常需要25%的系统存储空间
# 准备数据
datasets = "https://raw.githubusercontent.com/DarrenCook/h2o/bk/datasets/"
data = h2o.import_file(datasets + "iris_wheader.csv") # 输入数据
y = "class" # 变量y是指要学习的字段名称,在无监督学习中不需要设置此变量
x = data.names # 从何处学习的字段名称,这里表示所有其他字段
x.remove(y)
train, test = data.split_frame([0.8]) # 分割为训练数据和测试数据,这里选取了80%的数据进行训练,剩下的来进行测试
# 训练模型
m = h2o.estimators.deeplearning.H2ODeepLearningEstimator() # 使用默认值,创建一个机器学习算法的对象
m.train(x, y, train) # 开始训练,并指定使用所有的数据集
print("# MSE:", m.mse()) # 显示MSE(均方误差)
print("# Confusion Matrix: \n", m.confusion_matrix(train)) # 显示混淆矩阵(显示每个类别有多少正确, 错误时所选择的类别)
# 使用模型进行预测
p = m.predict(test)
print("# Predict: \n", p) # 默认只显示前10行
print("# as_data_frame : \n", p.as_data_frame()) # 显示所有行
print("# mean: ", (p["predict"] == test["class"]).mean()) # 显示正确的百分比
print("# cbind: \n", p["predict"].cbind(test["class"]).as_data_frame()) # 显示每个预测的两列输出
# 一些默认约定
# - y变量:H2O中某一列是需要预测的内容,将该列名称定为y变量(在无监督学习中不需要设置此变量)
# - x变量:数据中的一些列或所有其他列是需要从中学习的内容,这些列称为x变量
# - data变量:用于完整的数据
# - train变量:用于训练帧子集
# - valid变量:用于验证的子集
# - test变量:用于测试的子集
# 建议采用更为清楚有意义的简写名称.
2.2 - 显示结果
D:\Temp\Anaconda3\envs\h2o\python.exe D:/Anliven/Anliven-Code/PycharmProjects/TempTest/TempTest_1.py
Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
; Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
Starting server from D:\Temp\Anaconda3\envs\h2o\lib\site-packages\h2o\backend\bin\h2o.jar
Ice root: C:\Users\anliven\AppData\Local\Temp\tmptafn6xd_
JVM stdout: C:\Users\anliven\AppData\Local\Temp\tmptafn6xd_\h2o_anliven_started_from_python.out
JVM stderr: C:\Users\anliven\AppData\Local\Temp\tmptafn6xd_\h2o_anliven_started_from_python.err
Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.
-------------------------- ------------------------------------------
H2O cluster uptime: 02 secs
H2O cluster timezone: +08:00
H2O data parsing timezone: UTC
H2O cluster version: 3.24.0.5
H2O cluster version age: 6 days
H2O cluster name: H2O_from_python_anliven_be1ik6
H2O cluster total nodes: 1
H2O cluster free memory: 10.64 Gb
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster status: accepting new members, healthy
H2O connection url: http://127.0.0.1:54321
H2O connection proxy:
H2O internal security: False
H2O API Extensions: Amazon S3, Algos, AutoML, Core V3, Core V4
Python version: 3.6.2 final
-------------------------- ------------------------------------------
Parse progress: |█████████████████████████████████████████████████████████| 100%
deeplearning Model Build progress: |██████████████████████████████████████| 100%
# MSE: 0.039118900961189924
# Confusion Matrix:
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
Iris-setosa Iris-versicolor Iris-virginica Error Rate
------------- ----------------- ---------------- -------- -------
40 0 0 0 0 / 40
0 34 5 0.128205 5 / 39
0 0 38 0 0 / 38
40 34 43 0.042735 5 / 117
deeplearning prediction progress: |███████████████████████████████████████| 100%
# Predict:
predict Iris-setosa Iris-versicolor Iris-virginica
----------- ------------- ----------------- ----------------
Iris-setosa 0.999995 5.26512e-06 1.22522e-23
Iris-setosa 0.999998 2.10502e-06 2.36894e-24
Iris-setosa 0.999996 4.30403e-06 1.68815e-23
Iris-setosa 0.99995 5.0415e-05 4.90541e-23
Iris-setosa 0.999999 1.23285e-06 4.16845e-24
Iris-setosa 0.999997 3.05992e-06 4.10819e-23
Iris-setosa 0.999946 5.44824e-05 5.15226e-22
Iris-setosa 0.999999 8.97722e-07 2.31546e-23
Iris-setosa 0.99999 9.56155e-06 1.59912e-23
Iris-setosa 1 3.44765e-07 4.95222e-24
[33 rows x 4 columns]
# as_data_frame :
predict Iris-setosa Iris-versicolor Iris-virginica
0 Iris-setosa 9.999947e-01 5.265116e-06 1.225220e-23
1 Iris-setosa 9.999979e-01 2.105018e-06 2.368935e-24
2 Iris-setosa 9.999957e-01 4.304033e-06 1.688151e-23
3 Iris-setosa 9.999496e-01 5.041504e-05 4.905406e-23
4 Iris-setosa 9.999988e-01 1.232852e-06 4.168452e-24
5 Iris-setosa 9.999969e-01 3.059924e-06 4.108188e-23
6 Iris-setosa 9.999455e-01 5.448235e-05 5.152261e-22
7 Iris-setosa 9.999991e-01 8.977222e-07 2.315463e-23
8 Iris-setosa 9.999904e-01 9.561553e-06 1.599121e-23
9 Iris-setosa 9.999997e-01 3.447651e-07 4.952222e-24
10 Iris-versicolor 1.285173e-07 9.774696e-01 2.253031e-02
11 Iris-versicolor 8.456613e-05 9.979772e-01 1.938266e-03
12 Iris-versicolor 4.829308e-02 9.517061e-01 8.497348e-07
13 Iris-versicolor 4.169988e-07 9.999681e-01 3.150848e-05
14 Iris-versicolor 1.805217e-06 9.998308e-01 1.673994e-04
15 Iris-versicolor 8.759536e-05 9.999115e-01 8.606799e-07
16 Iris-versicolor 2.206746e-05 9.999167e-01 6.120105e-05
17 Iris-versicolor 3.302204e-06 9.998997e-01 9.695184e-05
18 Iris-versicolor 3.622209e-08 9.389008e-01 6.109913e-02
19 Iris-versicolor 9.407188e-03 9.905912e-01 1.631313e-06
20 Iris-versicolor 1.332645e-03 9.986596e-01 7.739634e-06
21 Iris-virginica 5.299107e-16 7.827116e-07 9.999992e-01
22 Iris-virginica 9.149237e-16 4.476949e-09 1.000000e+00
23 Iris-virginica 4.123180e-13 1.779434e-07 9.999998e-01
24 Iris-virginica 7.280032e-08 6.898109e-03 9.931018e-01
25 Iris-virginica 5.853220e-17 9.229382e-07 9.999991e-01
26 Iris-virginica 1.171212e-12 2.643036e-04 9.997357e-01
27 Iris-virginica 2.345086e-16 2.944686e-09 1.000000e+00
28 Iris-virginica 8.742579e-08 2.479772e-01 7.520227e-01
29 Iris-virginica 1.258946e-09 1.586186e-02 9.841381e-01
30 Iris-virginica 2.918212e-07 1.127815e-02 9.887216e-01
31 Iris-virginica 1.635366e-13 3.913354e-06 9.999961e-01
32 Iris-virginica 1.160129e-11 2.099658e-07 9.999998e-01
# mean: [1.0]
# cbind:
predict class
0 Iris-setosa Iris-setosa
1 Iris-setosa Iris-setosa
2 Iris-setosa Iris-setosa
3 Iris-setosa Iris-setosa
4 Iris-setosa Iris-setosa
5 Iris-setosa Iris-setosa
6 Iris-setosa Iris-setosa
7 Iris-setosa Iris-setosa
8 Iris-setosa Iris-setosa
9 Iris-setosa Iris-setosa
10 Iris-versicolor Iris-versicolor
11 Iris-versicolor Iris-versicolor
12 Iris-versicolor Iris-versicolor
13 Iris-versicolor Iris-versicolor
14 Iris-versicolor Iris-versicolor
15 Iris-versicolor Iris-versicolor
16 Iris-versicolor Iris-versicolor
17 Iris-versicolor Iris-versicolor
18 Iris-versicolor Iris-versicolor
19 Iris-versicolor Iris-versicolor
20 Iris-versicolor Iris-versicolor
21 Iris-virginica Iris-virginica
22 Iris-virginica Iris-virginica
23 Iris-virginica Iris-virginica
24 Iris-virginica Iris-virginica
25 Iris-virginica Iris-virginica
26 Iris-virginica Iris-virginica
27 Iris-virginica Iris-virginica
28 Iris-virginica Iris-virginica
29 Iris-virginica Iris-virginica
30 Iris-virginica Iris-virginica
31 Iris-virginica Iris-virginica
32 Iris-virginica Iris-virginica
H2O session _sid_aa65 closed.
Process finished with exit code 0
3 - 在Flow(流)中运行Iris数据集的深度学习
Flow:http://docs.h2o.ai/h2o/latest-stable/h2o-docs/flow.html#
作为H2O一部分的Web接口名称(不需要额外的安装步骤),可以完成如下操作:
- 查看通过客户端上传的数据
- 直接上传数据
- 查看通过客户端创建的模型(以及正在创建的模型)
- 直接创建模型
- 查看通过客户端生成的预测
- 直接预测
3.1 - 启动
直接运行jar文件来启动H2O Flow
[Anliven@localhost Downloads]$ pwd
/home/Anliven/Downloads
[Anliven@localhost Downloads]$ ls -l
total 402984
drwxr-xr-x 5 Anliven Anliven 60 Jun 19 08:19 h2o-3.24.0.5
-rw-rw-r-- 1 Anliven Anliven 368257676 Jun 19 21:57 h2o-3.24.0.5.zip
drwxr-xr-x 5 Anliven Anliven 84 Dec 22 2017 h2o-bk
-rw-rw-rw- 1 Anliven Anliven 44392957 Jun 23 22:25 基于H2O的机器学习实用方法.zip
[Anliven@localhost Downloads]$
[Anliven@localhost Downloads]$ cd h2o-3.24.0.5/
[Anliven@localhost h2o-3.24.0.5]$ java -jar h2o.jar -ip 192.168.16.101 -port 54321
06-27 22:32:49.845 192.168.16.101:54321 3486 main INFO: ----- H2O started -----
06-27 22:32:49.864 192.168.16.101:54321 3486 main INFO: Build git branch: rel-yates
06-27 22:32:49.864 192.168.16.101:54321 3486 main INFO: Build git hash: b9cd4d5bcd44a4949ca8c677c5e54c10ee72c968
06-27 22:32:49.864 192.168.16.101:54321 3486 main INFO: Build git describe: jenkins-3.24.0.4-66-gb9cd4d5
06-27 22:32:49.864 192.168.16.101:54321 3486 main INFO: Build project version: 3.24.0.5
06-27 22:32:49.864 192.168.16.101:54321 3486 main INFO: Build age: 8 days
06-27 22:32:49.865 192.168.16.101:54321 3486 main INFO: Built by: 'jenkins'
06-27 22:32:49.865 192.168.16.101:54321 3486 main INFO: Built on: '2019-06-18 23:52:14'
06-27 22:32:49.865 192.168.16.101:54321 3486 main INFO: Found H2O Core extensions: [Watchdog, XGBoost, KrbStandalone]
06-27 22:32:49.865 192.168.16.101:54321 3486 main INFO: Processed H2O arguments: [-ip, 192.168.16.101, -port, 54321]
06-27 22:32:49.865 192.168.16.101:54321 3486 main INFO: Java availableProcessors: 2
06-27 22:32:49.865 192.168.16.101:54321 3486 main INFO: Java heap totalMemory: 240.0 MB
06-27 22:32:49.865 192.168.16.101:54321 3486 main INFO: Java heap maxMemory: 3.45 GB
06-27 22:32:49.866 192.168.16.101:54321 3486 main INFO: Java version: Java 1.8.0_161 (from Oracle Corporation)
06-27 22:32:49.866 192.168.16.101:54321 3486 main INFO: JVM launch parameters: []
06-27 22:32:49.866 192.168.16.101:54321 3486 main INFO: OS version: Linux 3.10.0-957.el7.x86_64 (amd64)
06-27 22:32:49.866 192.168.16.101:54321 3486 main INFO: Machine physical memory: 15.51 GB
06-27 22:32:49.866 192.168.16.101:54321 3486 main INFO: Machine locale: en_US
06-27 22:32:49.866 192.168.16.101:54321 3486 main INFO: X-h2o-cluster-id: 1561645969069
06-27 22:32:49.866 192.168.16.101:54321 3486 main INFO: User name: 'Anliven'
06-27 22:32:49.866 192.168.16.101:54321 3486 main INFO: IPv6 stack selected: false
06-27 22:32:49.867 192.168.16.101:54321 3486 main INFO: Network interface is down: name:virbr0 (virbr0)
06-27 22:32:49.867 192.168.16.101:54321 3486 main INFO: Possible IP Address: enp0s8 (enp0s8), fe80:0:0:0:cfdd:6281:f738:fba%enp0s8
06-27 22:32:49.867 192.168.16.101:54321 3486 main INFO: Possible IP Address: enp0s8 (enp0s8), 192.168.16.101
06-27 22:32:49.867 192.168.16.101:54321 3486 main INFO: Possible IP Address: enp0s3 (enp0s3), fe80:0:0:0:c48f:c289:276:2308%enp0s3
06-27 22:32:49.867 192.168.16.101:54321 3486 main INFO: Possible IP Address: enp0s3 (enp0s3), 10.0.2.15
06-27 22:32:49.867 192.168.16.101:54321 3486 main INFO: Possible IP Address: lo (lo), 0:0:0:0:0:0:0:1%lo
06-27 22:32:49.868 192.168.16.101:54321 3486 main INFO: Possible IP Address: lo (lo), 127.0.0.1
06-27 22:32:49.868 192.168.16.101:54321 3486 main INFO: H2O node running in unencrypted mode.
06-27 22:32:49.869 192.168.16.101:54321 3486 main INFO: Internal communication uses port: 54322
06-27 22:32:49.869 192.168.16.101:54321 3486 main INFO: Listening for HTTP and REST traffic on http://192.168.16.101:54321/
06-27 22:32:49.870 192.168.16.101:54321 3486 main INFO: H2O cloud name: 'Anliven' on /192.168.16.101:54321, static configuration based on -flatfile null
06-27 22:32:49.870 192.168.16.101:54321 3486 main INFO: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555):
06-27 22:32:49.870 192.168.16.101:54321 3486 main INFO: 1. Open a terminal and run 'ssh -L 55555:localhost:54321 [email protected]'
06-27 22:32:49.870 192.168.16.101:54321 3486 main INFO: 2. Point your browser to http://localhost:55555
06-27 22:32:50.627 192.168.16.101:54321 3486 main INFO: Log dir: '/tmp/h2o-Anliven/h2ologs'
06-27 22:32:50.627 192.168.16.101:54321 3486 main INFO: Cur dir: '/home/Anliven/Downloads/h2o-3.24.0.5'
06-27 22:32:50.641 192.168.16.101:54321 3486 main INFO: Subsystem for distributed import from HTTP/HTTPS successfully initialized
06-27 22:32:50.641 192.168.16.101:54321 3486 main INFO: HDFS subsystem successfully initialized
06-27 22:32:50.645 192.168.16.101:54321 3486 main INFO: S3 subsystem successfully initialized
06-27 22:32:50.663 192.168.16.101:54321 3486 main INFO: GCS subsystem successfully initialized
06-27 22:32:50.663 192.168.16.101:54321 3486 main INFO: Flow dir: '/home/Anliven/h2oflows'
06-27 22:32:50.681 192.168.16.101:54321 3486 main INFO: Cloud of size 1 formed [/192.168.16.101:54321]
06-27 22:32:50.690 192.168.16.101:54321 3486 main INFO: Registered parsers: [GUESS, ARFF, XLS, SVMLight, AVRO, PARQUET, CSV]
06-27 22:32:50.691 192.168.16.101:54321 3486 main INFO: Watchdog extension initialized
06-27 22:32:50.692 192.168.16.101:54321 3486 main INFO: XGBoost extension initialized
06-27 22:32:50.692 192.168.16.101:54321 3486 main INFO: KrbStandalone extension initialized
06-27 22:32:50.692 192.168.16.101:54321 3486 main INFO: Registered 3 core extensions in: 318ms
06-27 22:32:50.692 192.168.16.101:54321 3486 main INFO: Registered H2O core extensions: [Watchdog, XGBoost, KrbStandalone]
06-27 22:32:51.041 192.168.16.101:54321 3486 main INFO: Found XGBoost backend with library: xgboost4j_gpu
06-27 22:32:51.041 192.168.16.101:54321 3486 main INFO: XGBoost supported backends: [WITH_GPU, WITH_OMP]
06-27 22:32:51.229 192.168.16.101:54321 3486 main INFO: Registered: 174 REST APIs in: 537ms
06-27 22:32:51.229 192.168.16.101:54321 3486 main INFO: Registered REST API extensions: [Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4]
06-27 22:32:51.492 192.168.16.101:54321 3486 main INFO: Registered: 249 schemas in 263ms
06-27 22:32:51.493 192.168.16.101:54321 3486 main INFO: H2O started in 2407ms
06-27 22:32:51.493 192.168.16.101:54321 3486 main INFO:
06-27 22:32:51.493 192.168.16.101:54321 3486 main INFO: Open H2O Flow in your web browser: http://192.168.16.101:54321
06-27 22:32:51.493 192.168.16.101:54321 3486 main INFO:
3.2 - 数据
在开始界面点击importFiles
, 或者在开始页面的顶部菜单依次选择Data
-->Import Files
在新出现的Import Files对话框中, 填写Search
的路径后点击查找(放大镜图标), 然后在出现的Search Results
中选择数据文件, Selected Files
将显示选择结果.
注意: 这里的Search
路径可以是数据文件的绝对路径,也可以是以h2o.jar文件为参照的相对路径, 例如../h2o-bk/datasets
.
单击Import
按钮, 将显示文件导入的结果
单击Parse these files
可以自定义导入数据文件的设置, 一般情况下最好是保持默认值, 直接点击"Parse"即可.
可以点击View
或者iris_wheader1.hex
查看详细信息
在Actions
中选择Split...
按钮, 设置如何划分train
与test
数据集.
点击Create
按钮
3.3 - 模型
点击"train"后, 然后点击"Build Model...", 将出现算法选择界面
选择Deep learning
, 并选择参数response_column
为class
, 其余参数均保持默认值.
然后单击此对话框尾部的"Build Model"按钮, 开始训练
训练完成后, 点击View
按钮, 可以查看模型构建的参数和过程.
如果之前已经构建过模型, 那么从开始界面依次选择Model
--->List All Models
, 然后单击选择的模型, 就能够查看到此模型构建的参数和过程.
3.4 - 预测
从模型视图单击Predict...
, 然后指定名称/数据集
或者从开始界面依次选择Score
--->Predict
, 然后指定名称/选择模型/数据集
确定参数后, 点击Predict
, 将看到预测结果
4 - 其他
- 相比Python,在Flow中可以完成绝大多数类似的操作,不能完成某些数据操作。
- 在Python中加载数据,可以在Flow中观察;在Flow中加载数据,也可以在Python中观察。
- 通过Admin菜单下的
Water Meter
可以查看集群中每个CPU内核的工作状况。