Difference between revisions of "Limitations"

From ZeptoOS
Jump to navigationJump to search
Line 10: Line 10:
  
 
<pre>
 
<pre>
............  Assertion `currentSend() == &smsg' failed.
+
?????????: /gpfs/home/kazutomo/BGP/comm/DCMF/sys/messaging/devices/prod/tree/Device.cc:673: void DCMF::Queueing::Tree::Device::post(DCMF::Queueing::Tree::TreeSendMessage&): Assertion `currentSend() != &smsg' failed.
 
</pre>
 
</pre>
  

Revision as of 13:42, 15 May 2009

Top


Known Bugs / Current Limitations

I/O Helper thread bug(fails with a DCMF assertion)

Your MPI program might exit abnormally with a DCMF assertion due to I/O helper thread race condition. If this problem happens, you'll see an error message like below in your .error file.

?????????: /gpfs/home/kazutomo/BGP/comm/DCMF/sys/messaging/devices/prod/tree/Device.cc:673: void DCMF::Queueing::Tree::Device::post(DCMF::Queueing::Tree::TreeSendMessage&): Assertion `currentSend() != &smsg' failed.

A workaround is to disable the I/O helper thread. You can disable by passing the DCMF_ZEPTO_TREE_THREAD environment variable with 0 when you submit job. Here is an example.

$ cqsub -n 64 -t 20 -k zeptoos -e DCMF_ZEPTO_TREE_THREAD=0 .....

This problem is triggered from MPI collective primitive such as MPI_Allreduce(), MPI_Bcast() when BGP tree device is used.

No VN/DUAL mode in MPI

Blue Gene/P supports three job modes:

  • SMP (one application process per node)
  • DUAL (two application processes per node)
  • VN (four application processes per node)

In Cobalt, the job mode can be specified using cqsub -m or qsub --mode.

ZeptoOS will launch the appropriate number of application processes per node as determined by the mode; however, MPI jobs currently only work in the SMP mode. We plan to fix this problem in the near future.

No Universal Performance Counter (UPC)

UPC is not available in this release. Thus, PAPI will not work since it depends on UPC. We are currently trying to enable the UPC support in our Linux environment.

MPI-IO support

Due to the limitations of FUSE (the compute-node infrastructure we use for I/O forwarding of POSIX calls), if using the standard glibc, pathnames passed to MPI-IO routines need to be prefixed with bglockless: or bgl: (the latter will not work with PVFS; the former should work with all filesystems).

This should not be necessary when using the version of glibc modified for ZOID. That version should also give a better performance, so please give it a try if the performance with the standard glibc is unsatisfactory.

Also, within the DOE FastOS I/O forwarding project we are working on a new, high performance I/O forwarding infrastructure for parallel applications and as this work matures, we will integrate it into ZeptoOS.

Some MPI jobs hung when they are killed

We have been seeing this a lot with cn-ipfwd, the IP-over-torus program. This program runs "forever", so it eventually needs to be killed. When that happens, it will frequently hung one or more compute nodes, preventing the partition from shutting down cleanly.

However, the service node will force a shutdown after a timeout of five minutes, so in practice this is not a significant problem. Also, we have not seen this problem with ordinary MPI applications (unlike most MPI applications, cn-ipfwd is multithreaded and communicates a lot with the kernel).

mpirun -nofree does not work

mpirun -nofree (submitting multiple jobs without rebooting the nodes) does not work in the current release. Currently, partitions must be rebooted. We intend to fix it in the next version.

Features Coming Soon

Multiple MPI jobs one after another

Since ZeptoOS supports submitting a shell script as a compute node "application", it is possible to run multiple "real" applications from within one job:

#!/bin/sh

for i in 1 2 3 4 5 6 7 8 9 10; do
    /path/to/real/application
done

This does work for sequential applications, but not for those that are linked with MPI; with MPI, an application can only be run once. However, we have an experimental code that lifts this limitation and we plan to include it in the next release.


Top