Cmaq

From Autodiff
Jump to: navigation, search

CMAQ Plain (v 5.0.1) on stomp

cmaq website

Installation followed directions on CMAQ wiki

on stomp (using intel compilers version 13.1.3; cmaq_adj compiled with gfortran has runtime problems, see below under 'Latest Problems in Build Variants'):

cd /sandbox/utke/Argonne/Apps/CMAQ_Prereqs

Most of the CMAQ and cmaq_adj build and execution relies on csh scripts but not the prerequisites. So, depending on your preference do either

source setModulePath.bash

or

source setModulePath.csh

and then

module add cmaq

Prerequisites are built in CMAQ_Prereqs (see config.log files in the respective ..._build subdirs). I used the respective module files to build the prerequisite libs but note that for instance ioapi wants INSTALL and BIN to be defined in the environment which confuses autotools-based builds for libraries ioapi depends on.

CMAQ installed version is 5.0.1, the following assumes the cmaq module is loaded.

  • Note: the build procedure for CMAQ and cmaq_adj are mostly non-standard and fragile
  • the following reflects paths of the install on stomp consistent with the shell module files.
  • dependencies on prerequisites are resolved by adding the following symbolc links under the CMAQ directory:
cd $CMAQSRC
mkdir -p lib/x86_64/ifort/ioapi_3.1
cd lib/x86_64/ifort/ioapi_3.1
ln -s $IOAPISRC/ioapi 
ln -s $IOAPIROOT 
cd $CMAQSRC/lib/x86_64/ifort
ln -s $MPIROOT mpich
ln -s $NETCDFROOT netcdf
cd $IOAPIROOT
ln -s $IOAPISRC/Linux2_x86_64ifort/*.mod .
  • edit scripts/config.cmaq which is written for csh - from here on forward we will have to be in csh
utke(stomp)/sandbox/utke/Argonne/Apps/CMAQ/CMAQv5.0.1/scripts> diff config.cmaq ../../CMAQv5.0.1_pristine/scripts/config.cmaq 
15c15
<  setenv M3HOME  $CMAQSRC
---
>  setenv M3HOME  /Add your directory path here/CMAQv5.0.1
26c26
<  setenv extra_lib "-lopa -lmpl -lrt -lpthread -ldl"
---
>  setenv extra_lib "-lrdmacm -libumad -lopa -lmpl -lrt -lpthread -libverbs -ldl"
29,35c29,35
<    setenv compiler ifort
<    setenv myFC ifort 
<    setenv myCC icc
<    setenv myLINK_FLAG "-static-intel"
<    setenv myFFLAGS "-fixed -132 -O3 -override-limits -fno-alias -mp1 -fp-model precise"
<    setenv myFRFLAGS "-free -O3 -fno-alias -fp-model precise -mp1"
<    setenv myCFLAGS "-O2"
---
> #   setenv compiler ifort
> #   setenv myFC ifort 
> #   setenv myCC icc
> #   setenv myLINK_FLAG "-static-intel"
> #   setenv myFFLAGS "-fixed -132 -O3 -override-limits -fno-alias -mp1 -fp-model precise"
> #   setenv myFRFLAGS "-free -O3 -fno-alias -fp-model precise -mp1"
> #   setenv myCFLAGS "-O2"
  • make sure to switch to csh and then do
source config.cmaq

Make sure that all the environment variables are retained or reload the modules in csh, i.e.

module add cmaq 
  • also edit:
    • scripts/bcon/bldit.bcon
    • scripts/icon/bldit.icon
    • scripts/jproc/bldit.jproc
    • scripts/cctm/bldit.cctm

illustrating the changes using bldit.bcon as an example:

99c99
<  set IOAPI  = "${M3LIB}/ioapi_3.1/Linux2_${system}${compiler} -lioapi -openmp"
---
>  set IOAPI  = "${M3LIB}/ioapi_3.1/Linux2_${system}${compiler} -lioapi"
102c102
<  set NETCDF = "${M3LIB}/netcdf/lib -lnetcdff -lnetcdf"
---
>  set NETCDF = "${M3LIB}/netcdf/lib -lnetcdf"
  • followed the instructions on the wiki under "Compiling CMAQ for the Benchmark Test Case Simulation" to build the components using the bldit scripts in particular under
    • scripts/build
    • scripts/stenex (both se and se_noop just to be sure ...)
    • scripts/pario
    • scripts/bcon
    • scripts/icon
    • scripts/jproc
    • scripts/cctm

Note, that in some cases the above is executing the compiles just via a script (i.e. without Makefile) which means it may just continue despite a compile error caused by a missing include or missing libs, e.g. if the above is done out of order or one of the script files hadn't been properly changed. This may result in icomplete libraries.

  • unpack the data files for the benchmark as in
cd $CMAQSRC/..
tar -zxvf ~utke/Downloads/CMAQ/DATA.CMAQv5.0.1.tar.gz
  • followed "Running the CMAQ Benchmark Simulation" to execute a benchmark; for <x> in icon,bcon,jproc do cd scripts/<x> and execute the script run.<x>
  • if e.g. the run.icon exits with some error message it may be necessary to remove a leftover profile from a previous run or in the data untared from the archive (DATA.CMAQv5.0.1.tar.gz) into $CMAQSRC/data. One should look in the output for a message like
 Value for INIT_CONC_1:  '/sandbox/Argonne/Apps/CMAQ/CMAQv5.0.1_mpich-3.0.2_intel-12.1.1/data/icon/ICON_V5g_CMAQ-BENCHMARK_profile -v'

remove that profile (here named $CMAQSRC/data/icon/ICON_V5g_CMAQ-BENCHMARK_profile ) and restart

  • setup for single processor execution:
setenv NPROCS 1
setenv NPCOL_NPROW "1 1"
cd scripts/cctm
./run.cctm >&!  cctm.out &
  • can look at output, e.g. CMAQv5.0.1/data/cctm/CCTM_V5g_Linux3_x86_64gfort.CONC.CMAQ-BENCHMARK_20060801 with ncview

CMAQ-Adjoint (on stomp)

website

git URL:

ssh://git@adjoint.colorado.edu:2222/cmaq_adj.git

temporary changes not for pushing to the main repo are kept here;

ssh://utke@login.mcs.anl.gov/home/utke/GitMain/CodeReps/cmaq_adj

in branch (for serial run compiled with ifort):

JU_DontMergeIntelSerialBuildChanges

The problems with the other build setups as tracked by the other branches are outlined below.

Installation on stomp:

/sandbox/utke/Argonne/Apps/CMAQ/cmaq_adj

Path is set in envirnment variable CMAQ_ADJ_SRC by

module add cmaq_adj

First build the home grown build support

cd $CMAQ_ADJ_SRC
cd BLDMAKE_git
make

for the forward sweep I adapted the build script:

scripts/bldit.adjoint.fwd.sample

as illustrated by changesets up to and including e7ff7e285e367a7e822b1f42d0e0975ed7aa161f in branch JU_DontMergeIntelSerialBuildChanges

  • in cmaq_adj link the scripts like this
cd $CMAQ_ADJ_SRC
ln -s scripts/bldit.adjoint.fwd.sample bldit.adjoint.fwd
ln -s scripts/bldit.adjoint.bwd.sample bldit.adjoint.bwd
ln -s scripts/run.adj.fwd.bnmk.template run.adj.fwd.bnmk
ln -s scripts/run.adj.bwd.bnmk.template run.adj.bwd.bnmk
  • setup to build the forward sweep binary
cd $CMAQ_ADJ_SRC
./bldit.adjoint.fwd
  • edit the file:
BLD_fwd/KPP_Integrator.F90

like this:

336c336
<    Roundoff = WLAMCH('E')
---
> !   Roundoff = WLAMCH('E')
339c339
< !   Roundoff = 1.0E-14
---
>    Roundoff = 1.0E-14

Note that the edit may be overwritten if the bldit.adjoint.fwd is rerun. Then build with:

cd BLD_fwd
make
  • build the backward sweep binary
cd $CMAQ_ADJ_SRC
./bldit.adjoint.bwd
cd BLD_bwd
make
  • unpack the test data like this:
cd $CMAQ_ADJ_SRC
tar -zxvf ~utke/Downloads/CMAQ/cmaq_adj_test_data.tar.gz
  • run the forward sweep binary
cd $CMAQ_ADJ_SRC
./run.adj.fwd.bnmk 

wait a few hours and then inspect the output to find a successful completion message like this:

     >>---->  Program completed successfully  <----<<
  • Then do the backward sweep run like this:
./run.adj.bwd.bnmk 

and inspect the output again for the successful completion line as above

Latest Problems in Build Variants

serial build (without MPI) under gfortran (gcc v 4.7.1) (see git branch JU_DontMergeLocalBuildChanges): SEGV in ioapi - no indication so far about any further details:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) where
#0  0x0000000000000000 in ?? ()
#1  0x00000000006100ba in __gthread_mutex_destroy (__mutex=0x1431970) at ../libgcc/gthr-default.h:760
#2  destroy_unit_mutex (u=0x14318a0) at ../../../gcc-4.7.1/libgfortran/io/unit.c:214
#3  0x00000000005947f5 in init3_ ()
#4  0x000000000044c647 in setup_logdev () at setup_logdev.F:71
#5  0x000000000040612c in grid_conf::grid_init (nprocs=1, myid=<optimized out>) at GRID_CONF.F:66
#6  0x0000000000441745 in par_init (colrow='CR', nspcs=<optimized out>, clock=0, ierror=0, _colrow=_colrow@entry=2) at par_init_noop.F:93
#7  0x0000000000430215 in driver_fwd () at driver_fwd.F:158
#8  0x00000000004017ad in main (argc=argc@entry=1, argv=0x7fffffffdff1) at driver_fwd.F:44
#9  0x0000000000655a41 in __libc_start_main (main=0x401790 <main>, argc=1, ubp_av=0x7fffffffdb18, init=0x655f00 <__libc_csu_init>,
   fini=0x655f90 <__libc_csu_fini>, rtld_fini=0x0, stack_end=0x7fffffffdb08) at libc-start.c:226
#10 0x0000000000402951 in _start () at ../sysdeps/x86_64/elf/start.S:113


parallel build (with MPICH 3.0.2) built under intel 12.1.0 (see git branch JU_DontMergeIntelBuildChanges)

  • execution on 4 ranks: NULL pointer for buffer in Irecv:
Fatal error in MPI_Irecv: Invalid buffer pointer, error stack:
MPI_Irecv(145): MPI_Irecv(buf=(nil), count=47736, MPI_REAL, src=2, tag=0, MPI_COMM_WORLD, request=0x7fff77aa0f10) failed
MPI_Irecv(115): Null buffer pointer
Fatal error in MPI_Irecv: Invalid buffer pointer, error stack:
MPI_Irecv(145): MPI_Irecv(buf=(nil), count=44064, MPI_REAL, src=3, tag=0, MPI_COMM_WORLD, request=0x7fff2ca638f0) failed
MPI_Irecv(115): Null buffer pointer

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=================================================================================== 

callstack for error 2 of 4 ranks:

#4  0x00000000006db1ff in pmpi_irecv__ ()
#5  0x00000000006cfa36 in swap4d_ ()
#6  0x00000000004fb1bf in hdiff (cgrid=0x5f494d50000a7325, jdate=1635017030,
    jtime=<error reading variable: Cannot access memory at address 0x1>,
    tstep=...) at hdiff.F:297
#7  0x00000000004993ad in sciproc (cgrid=0x5f494d50000a7325, jdate=1635017030,
    jtime=<error reading variable: Cannot access memory at address 0x1>,
    tstep=..., astep=..., xfirst=...) at sciproc.F:240
#8  0x000000000048e43f in driver_fwd () at driver_fwd.F:270
#9  0x000000000040b2cc in main ()
  • execution on 1 rank: bad JVALUE (?)
..... MPI etc internals ...
#4  0x00000000006d2164 in pm3exit_ ()
#5  0x000000000057d808 in phot (mdate=684837, mtime=1819308129,
    jdate=<error reading variable: Cannot access memory at address 0x0>,
    jtime=1819308129, ndark=771751936, rj=...) at phot.F:248
#6  0x00000000005b3e35 in chem (cgrid=0x5f494d50000a7325, jdate=1819308129,
    jtime=<error reading variable: Cannot access memory at address 0x0>,
    tstep=...) at kppdriver.F:256
#7  0x0000000000499482 in sciproc (cgrid=0x5f494d50000a7325, jdate=1819308129,
    jtime=<error reading variable: Cannot access memory at address 0x0>,
    tstep=..., astep=..., xfirst=...) at sciproc.F:257
#8  0x000000000048e43f in driver_fwd () at driver_fwd.F:270
#9  0x000000000040b2cc in main ()

The lines around 248 in phot.F are :

247         XMSG = 'Error reading number of LEVELS from JVALUE file'
248         IF ( IOST .NE. 0 )
249      &    CALL M3EXIT ( PNAME, JDATE, JTIME, XMSG, XSTAT1 )