1. 下载Maui3.3.1:http://www.adaptivecomputing.com/support/download-center/maui-cluster-scheduler/,先要注册一下。
2. 解压、编译、安装Maui
- tar zxvf maui-3.3.1.tar.gz
- cd maui-3.3.1
- ./configure
- make
- make install
3. 配置Maui
a) 修改配置文件 /usr/local/maui/maui.cfg,主要修改 SERVERHOST 和 ADMIN1两项,即服务器主机名和管理账户。主机名一般是类似XXX.localdomain的形式,所以/etc/hosts里面也要把XXX.localdomain写在XXX的前面。管理账户一般为root。
- vi /usr/local/maui/maui.cfg
- vi /etc/hosts
b) 到安装目录下的 contrib/service-scripts 下,修改文件redhat-maui.d,一个是改 MAUI_PREFIX=/usr/local/maui,另一个是把 daemon --user maui 改为daemon --user root,也可以把--user maui删去。
删去。
- cd maui-3.3.1/contrib/service-scripts
- vi redhat.maui.d
c) 给redhat-maui.d文件加可执行属性,并拷贝到 /etc/init.d/,顺便改名maui。
- chmod +x redhat.maui.d
- cp redhat.maui.d /etc/init.d/maui
d) 添加maui到系统服务,设置pbs_sched开机不启动,maui为开机启动(也可以手动关闭pbs_sched,然后启动maui),把maui一些变量加到系统路径里。
- chkconfig --add maui
- chkconfig pbs_sched off
- chkconfig maui on
- service pbs_sched stop
- service maui start
- cp maui-3.3.1/etc/maui.sh /etc/profile.d
4. Maui自带的管理命令
装上 maui 后,在路径/usr/local/maui/bin下,提供了
checkjob checknode showbf showconfig showgrid showhold
showq showres showstart showstate showstats runjob canceljob ...
等一系列查询和管理命令,不过只有部分命令是普通用户有权限运行的(比如上面粗体的),其中有些命令与 torque 提供的命令功能类似,比如 showq 与 qstat,canceljob 与 qdel 等。
5. 安装Maui后Torque设置的调整
a) 修改nodes配置,对每个节点加入一些属性参数
修改后的参考:
node1 np=20 nodes limits
node2 np=20 nodes limits
node3 np=20 nodes limits
node4 np=20 nodes limits
node5 np=20 nodes limits
node6 np=48 nodes
node7 np=44 nodes
node8 np=40 nodes
b) 导出当前Torque服务器设置,修改,加入用户资源限制参数。
修改后的torque.conf,注意修改红字部分:
#
# Create queues and set their attributes.
#
#
# Create and define queue cal
#
create queue cal
set queue cal queue_type = Execution
set queue cal Priority = 10
set queue cal max_running = 50
set queue cal resources_max.cput = 9999:00:00
set queue cal resources_min.cput = 00:00:01
set queue cal resources_default.neednodes = limits
set queue cal enabled = True
set queue cal started = True
#
# Create and define queue work
#
create queue work
set queue work queue_type = Execution
set queue work max_running = 50
set queue work acl_user_enable = True
set queue work acl_users = user1
set queue work acl_users += user2
set queue work resources_max.cput = 9999:00:00
set queue work resources_default.neednodes = nodes
set queue work max_user_run = 99
set queue work enabled = True
set queue work started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = XXX.localdomain
set server acl_hosts += XXX
set server managers = user1@*.localdomain
set server operators = user1@*.localdomain
set server operators += user2@*.localdomain
set server default_queue = cal
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server keep_completed = 300
set server submit_hosts = XXX
set server next_job_number = 1001
c) 导入修改好的torque.conf,重新启动pbs_server和maui。
d) 如果要对work队列加入新的用户,可以用命令:
参考资料:
[1] http://docs.adaptivecomputing.com/maui/pbsintegration.php
[2] http://goodluck1982.blog.sohu.com/245057803.html
[3] http://blog.csdn.net/zokie/article/details/5848566
5. 安装Maui后Torque设置的调整
a) 修改nodes配置,对每个节点加入一些属性参数
- vi /var/spool/torque/server_priv/nodes
node1 np=20 nodes limits
node2 np=20 nodes limits
node3 np=20 nodes limits
node4 np=20 nodes limits
node5 np=20 nodes limits
node6 np=48 nodes
node7 np=44 nodes
node8 np=40 nodes
b) 导出当前Torque服务器设置,修改,加入用户资源限制参数。
- cd
- qmgr -c "print server" > torque.conf
- vi torque.conf
#
# Create queues and set their attributes.
#
#
# Create and define queue cal
#
create queue cal
set queue cal queue_type = Execution
set queue cal Priority = 10
set queue cal max_running = 50
set queue cal resources_max.cput = 9999:00:00
set queue cal resources_min.cput = 00:00:01
set queue cal resources_default.neednodes = limits
set queue cal enabled = True
set queue cal started = True
#
# Create and define queue work
#
create queue work
set queue work queue_type = Execution
set queue work max_running = 50
set queue work acl_user_enable = True
set queue work acl_users = user1
set queue work acl_users += user2
set queue work resources_max.cput = 9999:00:00
set queue work resources_default.neednodes = nodes
set queue work max_user_run = 99
set queue work enabled = True
set queue work started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = XXX.localdomain
set server acl_hosts += XXX
set server managers = user1@*.localdomain
set server operators = user1@*.localdomain
set server operators += user2@*.localdomain
set server default_queue = cal
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server keep_completed = 300
set server submit_hosts = XXX
set server next_job_number = 1001
c) 导入修改好的torque.conf,重新启动pbs_server和maui。
- qmgr < torque.conf
- service pbs_server restart
- service maui restart
d) 如果要对work队列加入新的用户,可以用命令:
- qmgr -c "set queue work acl_users += user3"
参考资料:
[1] http://docs.adaptivecomputing.com/maui/pbsintegration.php
[2] http://goodluck1982.blog.sohu.com/245057803.html
[3] http://blog.csdn.net/zokie/article/details/5848566