Difference between revisions of "Limitations"

From ZeptoOS
Jump to navigationJump to search
m (Known Bugs moved to Limitations)
Line 1: Line 1:
*With Tree Helper Thread enabled, DCMF might fail. Helper thread is disabled for now
+
[[ZeptoOS_Documentation|Top]]
*Compute Nodes don't terminate gracefully for certain compute jobs. i.e. cnip mpi jobs have no problem
+
----
 +
 
 +
==Known Bugs / Current Limitations==
 +
 
 +
===No VN/DUAL mode in MPI===
 +
 
 +
Blue Gene/P supports three job modes:
 +
 
 +
* SMP (one application process per node)
 +
* DUAL (two application processes per node)
 +
* VN (four application processes per node)
 +
 
 +
In Cobalt, the job mode can be specified using <tt>cqsub -m</tt> or <tt>qsub --mode</tt>.
 +
 
 +
ZeptoOS will launch the appropriate number of application processes per node as determined by the mode; however, MPI jobs currently only work in the SMP mode.  We plan to fix this problem in the near future.
 +
 
 +
===No MPI-IO support===
 +
 
 +
MPI-IO currently does not work because of the limitations of FUSE (the compute-node infrastructure we use for I/O forwarding of POSIX calls).
 +
 
 +
Within the DOE FastOS [http://www.iofsl.org/ I/O forwarding project] we are working on a new, high performance I/O forwarding infrastructure for parallel applications and as this work matures, we will integrate it with ZeptoOS.
 +
 
 +
===Some MPI jobs hung when they are killed===
 +
 
 +
We have been seeing this a lot with <tt>cnip</tt>, the IP-over-torus program. This program runs "forever", so it eventually needs to be killed.  When that happens, it will frequently hung one or more compute nodes, preventing the partition from shutting down cleanly.
 +
 
 +
However, the service node will force a shutdown after a timeout of five minutes, so in practice this is not a significant problem. Also, we have not seen this problem with ordinary MPI applications (<tt>cnip</tt> communicates a lot with the kernel and is multithreaded, which
 +
 
 +
==Features Coming Soon==
 +
 
 +
===Multiple MPI jobs one after another===
 +
 
 +
----
 +
[[ZeptoOS_Documentation|Top]]

Revision as of 14:52, 29 April 2009

Top


Known Bugs / Current Limitations

No VN/DUAL mode in MPI

Blue Gene/P supports three job modes:

  • SMP (one application process per node)
  • DUAL (two application processes per node)
  • VN (four application processes per node)

In Cobalt, the job mode can be specified using cqsub -m or qsub --mode.

ZeptoOS will launch the appropriate number of application processes per node as determined by the mode; however, MPI jobs currently only work in the SMP mode. We plan to fix this problem in the near future.

No MPI-IO support

MPI-IO currently does not work because of the limitations of FUSE (the compute-node infrastructure we use for I/O forwarding of POSIX calls).

Within the DOE FastOS I/O forwarding project we are working on a new, high performance I/O forwarding infrastructure for parallel applications and as this work matures, we will integrate it with ZeptoOS.

Some MPI jobs hung when they are killed

We have been seeing this a lot with cnip, the IP-over-torus program. This program runs "forever", so it eventually needs to be killed. When that happens, it will frequently hung one or more compute nodes, preventing the partition from shutting down cleanly.

However, the service node will force a shutdown after a timeout of five minutes, so in practice this is not a significant problem. Also, we have not seen this problem with ordinary MPI applications (cnip communicates a lot with the kernel and is multithreaded, which

Features Coming Soon

Multiple MPI jobs one after another


Top