分页: 1 / 1

如何用openmpi做多节点(跨节点)并行计算

发表于 : 2010-07-08 19:58
renxinzhi
集群上的PBS出了问题,所以就只能用命令提交任务了,但是不知道如何用命令把一个任务分配给多个不同的节点来运行(而不是在同一个节点上的多进程任务)。
在《并行计算导论》中写道:
5. 运行MPICH程序
多机环境中运行MPICH程序与单机环境类似,可以用mpirun来进行。运行程序前先创建一个machinefile文件,其中列出要使用的结点机名,然后用命令“mpirun -machinefile 文件名 ...”来在指定的结点上运行程序。例如,假设用户登录在结点node2上,文件mfile中包含下述内容:
node3
node4
则命令:
mpirun -machinefile mfile -np 3 cpi
将用node2,node3 和node4来运行程序cpi,每个结点一个进程,这是因为默认情况下mpirun总是将当前结点添加到程序的结点机列表中。如果不希望使用当前结点(node2),可以加上-nolocal选项:
mpirun -nolocal -machinefile mfile -np 3 cpi
选项-np给出的进程数与-machinefile给出的文件中的结点机数不一定要相等。如果进程数少于结点机数,则程序只使用其中的一部分结点。如果进程数多于结点机数,则一些结点上会运行多于一个进程。
openmpi的官方指南中写道:
SYNOPSIS
Single Process Multiple Data (SPMD) Model:

mpirun [ options ] <program> [ <args> ]

Multiple Instruction Multiple Data (MIMD) Model:

mpirun [ global_options ]
[ local_options1 ] <program1> [ <args1> ] :
[ local_options2 ] <program2> [ <args2> ] :
... :
[ local_optionsN ] <programN> [ <argsN> ]

Note that in both models, invoking mpirun via an absolute path name is equivalent to specifying the --prefix option with a <dir> value equivalent to the directory where mpirun resides, minus its last subdirectory. For example:

% /usr/local/bin/mpirun ...

is equivalent to

% mpirun --prefix /usr/local

QUICK SUMMARY
If you are simply looking for how to run an MPI application, you probably want to use a command line of the following form:

% mpirun [ -np X ] [ --hostfile <filename> ] <program>

This will run X copies of <program> in your current run-time environment (if running under a supported resource manager, Open MPI's mpirun will usually automatically use the corresponding resource manager process starter, as opposed to, for example, rsh or ssh, which require the use of a hostfile, or will default to running all X copies on the localhost), scheduling (by default) in a round-robin fashion by CPU slot. See the rest of this page for more details.
但是我模仿着写machinefile、mfile试了很多次,都是报错。
这是我的程序执行参数:

代码: 全选

nohup /home/software/openmpi-1.2.2-intel9/bin/mpirun -np 8 /home/bin/vasp.openmpi >out &
请把我的这个命令给修改一下,让它可以多节点分配这几个进程。并把machinefile和mfile的文件格式贴一下。谢谢