MaUI是一个集群任务调度器,比Torque自带的任务调度器pbs_sched功能丰富,适合中小集群使用。maui 只需装在管理节点上,计算节点上不必安装。

1. 下载Maui3.3.1:http://www.adaptivecomputing.com/support/download-center/maui-cluster-scheduler/,先要注册一下。

2. 解压、编译、安装Maui

  1. tar zxvf maui-3.3.1.tar.gz
  2. cd maui-3.3.1
  3. ./configure
  4. make
  5. make install
 maui 的默认 prefix 为 /usr/loca/maui
 
3. 配置Maui
a) 修改配置文件 /usr/local/maui/maui.cfg,主要修改 SERVERHOST 和 ADMIN1两项,即服务器主机名和管理账户。主机名一般是类似XXX.localdomain的形式,所以/etc/hosts里面也要把XXX.localdomain写在XXX的前面。管理账户一般为root。

  1. vi /usr/local/maui/maui.cfg
  2. vi /etc/hosts

b) 到安装目录下的 contrib/service-scripts 下,修改文件redhat-maui.d,一个是改 MAUI_PREFIX=/usr/local/maui,另一个是把 daemon --user maui 改为daemon --user root,也可以把--user maui删去。
删去。

  1. cd maui-3.3.1/contrib/service-scripts
  2. vi redhat.maui.d

c) 给redhat-maui.d文件加可执行属性,并拷贝到 /etc/init.d/,顺便改名maui。

  1. chmod +x redhat.maui.d
  2. cp redhat.maui.d /etc/init.d/maui

d) 添加maui到系统服务,设置pbs_sched开机不启动,maui为开机启动(也可以手动关闭pbs_sched,然后启动maui),把maui一些变量加到系统路径里。

  1. chkconfig --add maui
  2. chkconfig pbs_sched off
  3. chkconfig maui on
  4. service pbs_sched stop
  5. service maui start
  6. cp maui-3.3.1/etc/maui.sh /etc/profile.d

4. Maui自带的管理命令
装上 maui 后,在路径/usr/local/maui/bin下,提供了 
checkjob   checknode   showbf      showconfig  showgrid    showhold
showq       showres     showstart   showstate   showstats   runjob   canceljob ...
等一系列查询和管理命令,不过只有部分命令是普通用户有权限运行的(比如上面粗体的),其中有些命令与 torque 提供的命令功能类似,比如 showq 与 qstat,canceljob 与 qdel 等。

5. 安装Maui后Torque设置的调整
a) 修改nodes配置,对每个节点加入一些属性参数
  1. vi /var/spool/torque/server_priv/nodes
修改后的参考:
node1 np=20 nodes limits
node2 np=20 nodes limits
node3 np=20 nodes limits
node4 np=20 nodes limits
node5 np=20 nodes limits
node6 np=48 nodes
node7 np=44 nodes
node8 np=40 nodes

b) 导出当前Torque服务器设置,修改,加入用户资源限制参数。
  1. cd
  2. qmgr -c "print server" > torque.conf
  3. vi torque.conf
修改后的torque.conf,注意修改红字部分:
#
# Create queues and set their attributes.
#
#
# Create and define queue cal
#
create queue cal
set queue cal queue_type = Execution
set queue cal Priority = 10
set queue cal max_running = 50
set queue cal resources_max.cput = 9999:00:00
set queue cal resources_min.cput = 00:00:01
set queue cal resources_default.neednodes = limits
set queue cal enabled = True
set queue cal started = True
#
# Create and define queue work
#
create queue work
set queue work queue_type = Execution
set queue work max_running = 50
set queue work acl_user_enable = True
set queue work acl_users = user1
set queue work acl_users += user2
set queue work resources_max.cput = 9999:00:00
set queue work resources_default.neednodes = nodes
set queue work max_user_run = 99
set queue work enabled = True
set queue work started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = XXX.localdomain
set server acl_hosts += XXX
set server managers = user1@*.localdomain
set server operators = user1@*.localdomain
set server operators += user2@*.localdomain
set server default_queue = cal
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server keep_completed = 300
set server submit_hosts = XXX
set server next_job_number = 1001

c) 导入修改好的torque.conf,重新启动pbs_server和maui。

  1. qmgr < torque.conf
  2. service pbs_server restart
  3. service maui restart

d) 如果要对work队列加入新的用户,可以用命令:

  1. qmgr -c "set queue work acl_users += user3"

参考资料:
[1] http://docs.adaptivecomputing.com/maui/pbsintegration.php
[2] http://goodluck1982.blog.sohu.com/245057803.html
[3] http://blog.csdn.net/zokie/article/details/5848566



09-07 11:25