本文介绍了如何优化在LSF中使用的多线程程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个多线程数字处理应用程序,我们称之为myprogram.我计划在IBM的LSF网格上运行myprogram. LSF允许在不同计算机上的CPU上调度作业.例如,bsub -n 3 ... myprogram ...可以从node1分配两个CPU,从node2分配一个CPU.

I am working on a multithreaded number crunching app, let's call it myprogram. I plan to run myprogram on IBM's LSF grid. LSF allows a job to scheduled on CPUs from different machines. For example, bsub -n 3 ... myprogram ... can allocate two CPUs from node1 and one CPU from node2.

我知道我可以要求LSF在同一个节点中分配所有3个内核,但是我对我的工作计划在不同节点上的情况很感兴趣.

I know that I can ask LSF to allocate all 3 cores in the same node, but I am interested in the case where my job is scheduled onto different nodes.

  1. LSF如何管理此问题? myprogram是否将在node1和node2的两个不同进程中运行?

  1. How does LSF manage this? Will myprogram be run in two different processes in node1 and node2?

LSF是否自动管理node1和node2之间的数据传输?

Does LSF automatically manage data transfer between node1 and node2?

myprogram中我能做些什么使LSF易于管理?我应该使用任何LSF库吗?

Anything I can do in myprogram to make this easy for LSF to manage? Should I be making use of any LSF libraries?

推荐答案

第一季度的答案

提交像bsub -n 3 myprogram这样的作业时,所有LSF所做的就是在1-3个主机上分配3个插槽.这些主机中的一个将被指定为第一个执行主机",而LSF将在该主机上调度并运行myprogram的单个实例.

When you submit a job like bsub -n 3 myprogram, all LSF does is allocate 3 slots across 1-3 hosts. One of these hosts will be designated as the "first execution host", and LSF will dispatch and run a single instance of myprogram on that host.

如果要并行运行myprogram,则LSF有一个名为blaunch的命令,该命令实际上将为每个分配的内核启动一个程序实例.例如,提交您的作业,例如bsub -n 3 blaunch myprogram将运行3个myprogram实例.

If you want to run myprogram in parallel, LSF has a command called blaunch which will essentially launch one instance of a program per allocated core. For example, submit your job like bsub -n 3 blaunch myprogram will run 3 instances of myprogram.

回答第二季度

通过管理数据传输",我假设您是指myprogram实例之间的通信.答案是否定的,LSF是调度和调度工具.它所做的只是分配和调度,但它不知道调度程序在做什么. blaunch仅仅是一个任务启动器,它只是启动一个任务的多个实例.

By "manage data transfer" I assume you mean communication between the instances of myprogram. The answer is no, LSF is a scheduling and dispatching tool. All it does is allocation and dispatch, but it has no knowledge of what the dispatched program is doing. blaunch in turn is simply a task launcher, it just launches multiple instances of a task.

您在这里追求的是某种类似MPI的并行编程框架(例如,请参见www.openmpi.org).这提供了一组API和命令,使您可以以并行方式编写myprogram.

What you're after here is some kind of parallel programming framework like MPI (see for example www.openmpi.org). This provides a set of APIs and commands that allow you to write myprogram in a parallel fashion.

完成此操作并将程序转入mympiprogram后,您可以像bsub -n 3 mpirun mympiprogram一样将其提交给LSF. mpirun工具-至少在OpenMPI(和其他一些工具)中-与LSF集成,并使用引擎盖下的blaunch界面为您启动任务.

Once you've done that and turned your program in to mympiprogram, you can submit it to LSF like bsub -n 3 mpirun mympiprogram. The mpirun tool - at least in the case of OpenMPI (and some others) - integrates with LSF, and uses the blaunch interface under the hood to launch your tasks for you.

回答第三季度

您不需要在程序中使用LSF库来使LSF变得更容易,就像我说过的那样,程序内部对系统是透明的. LSF库仅使您的程序成为LSF系统的客户端(提交作业,查询等)

You don't need to use LSF libraries in your program to make anything easier for LSF, like I said what's going on inside the program is transparent to the system. LSF libraries just enable your program to become a client of the LSF system (submit jobs, query, etc...)

这篇关于如何优化在LSF中使用的多线程程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-11 16:01