(K)TAU

From ZeptoOS
Revision as of 18:25, 28 April 2009 by Anataraj (talk | contribs)
Jump to navigationJump to search

Top


Both TAU (Tuning and Analysis Utilities) and its Linux kernel counterpart, KTAU have been ported to ZeptoOS. In addition the build system of the ZeptoOS kernel has been modified to simplify the integration of KTAU patches. The following sections describe how to acquire, configure, build and use TAU and KTAU under ZeptoOS, with Surveyor at ANL as the test platform.


TAU

Acquiring TAU

You can download the current (as of 2009-04-28) supported version 2.18.1p2-zepto of TAU. All future versions (greater than 2.18.1) should support ZeptoOS and will be available from the usual TAU downloads page.

Configuration under ZeptoOS on BG/P

Before proceeding we assume that ZeptoOS has been downloaded, configured and installed. For the purpose of this how-to, the ZeptoOS install location is assumed to be "/home/anataraj/ZeptoReleases/install-BGP". You may have a different location or may opt to use the default ZeptoOS install location on your machine. Installing your own ZeptoOS can done by issuing 'python install.py /path/to/install' at the root-directory of the ZeptoOS release (but please refer to the detailed ZeptoOS documentation).

TAU needs to be told where the ZeptoOS install location is using the '-zeptodir' configure option.

[email protected]:~/tau-2.18.1p2> pwd
/home/anataraj/tau-2.18.1p2

[email protected]:~/tau-2.18.1p2> ./configure -fullhelp | grep Zepto
-zeptodir=<dir> ............................. ZeptoOS install location.

A simple example configuration is shown below that includes the -mpi option. It is important to note that the TAU configuration under ZeptoOS on BGP is different from the standard TAU configuration on BGP when using the default IBM OS suite.

In this case the architecture needs to be specified as 'ppc' in addition to the ZeptoOS install location.

[email protected]:~/tau-2.18.1p2> ./configure -arch=ppc \
                                                     -zeptodir=/home/anataraj/ZeptoReleases/install-BGP \
                                                     -mpi


Configuring with PDT under ZeptoOS on BGP

The Program Database Toolkit(PDT) provides TAU with automatic source instrumentation capability. This allows routines, particularly in large projects, to be instrumented without tedious manual intervention.

First download and untar PDT. Then configure and build it:

[email protected]:~/pdtoolkit-3.14> pwd
/home/anataraj/pdtoolkit-3.14

[email protected]:~/pdtoolkit-3.14> ./configure

[email protected]:~/pdtoolkit-3.14> make

[email protected]:~/pdtoolkit-3.14> make install

Next configure TAU with the above PDT:

[email protected]:~/tau-2.18.1p2> ./configure  -arch=ppc \
                                                      -zeptodir=/home/anataraj/ZeptoReleases/install-BGP \
                                                      -mpi \
                                                      -pdt=/home/anataraj/pdtoolkit-3.14 \
                                                      -pdtarchdir=ppc64 -pdt_c++=xlC

Please pay close attention to the '-pdtarchdir=' and the '-pdt_c++=' options as they are important to it working on the ZeptoOS/BGP platform.

Please refer to the TAU documentation for various other TAU configuration options that may be used.


Building TAU

Lastly, build TAU in-place by issuing a 'make install'.

[email protected]:~/tau-2.18.1p2> make install

This should provide a TAU built for ZeptoOS with MPI profiling and automatic instrumentation capabilities.







KTAU

Acquiring KTAU

You can download the latest version (1.7.15-bgpzepto-p1) of KTAU available as a gzipped tarball from the KTAU downloads page.

Integrating KTAU into ZeptoOS

For the below examples showing the configuration and build steps, it is assumed that the KTAU tarball has been expanded into the location /home/anataraj/ktau-1.7.15-bgpzepto and that the ZeptoOS tarball has been expanded into /home/anataraj/BGP.

Configure ZeptoOS to point to KTAU patch and path

Begin by configuring the ZeptoOS by pointing it to the correct KTAU path and the name of the patch to be used.

[email protected]:~/BGP> ./configure --edit

KTAU configure-edit.jpg

Once this is done, the changes will be saved to the Make.rules file.

[email protected]:~/BGP> cat Make.rules
#
# Automatically generated make config: don't edit
#

#
# BG/P DIST_DIR
#
DRV_DIR="/bgsys/drivers/ppcfloor/"
BGP_CROSS="$(DRV_DIR)/gnu-linux/bin/powerpc-bgp-linux-"
BGCNS_H_PATH="$(DRV_DIR)/mcp-2.6.16.46/include/"
BGCNS_H="$(BGCNS_H_PATH)/bgcns.h"
OS_DIR="/bgsys/linux/1.3.020081029"

#
# KTAU
#
KTAU_DIR="/home/anataraj/ktau-1.7.15-bgpzepto"
KTAU_PATCH="patch-2.6.19.2-ktau-1.7.15-bgpzepto"

Patching ZeptoOS with KTAU

Step down into the "kernel" directory of ZeptoOS next. Issuing a 'make help' will show all the targets, including the KTAU specific ones (i.e. the last two targets in the list).

[email protected]:~/BGP> cd kernel/

[email protected]:~/BGP/kernel> make help

[menu]

bgp-ion-linux                 : Simply use recent built or prebuilt ION kernel (no kernel rebuild)
bgp-ion-linux-prebuilt        : Use prebuilt ION kernel
bgp-ion-linux-menuconfig      : Invoke ION kernel menuconfig
bgp-ion-linux-build           : Rebuild ION kernel and copy
bgp-ion-linux-try-patching    : Try to apply new patch to ION kernel
bgp-ion-linux-update-prebuilt : Copy ION kernel from tmp dir to prebuilt dir (no rebuild)

bgp-cn-linux                  : Rebuild CN ramdisk and merge with recent built or prebuilt CN kernel objs
bgp-cn-linux-prebuilt         : Use prebuilt CN kernel
bgp-cn-linux-menuconfig       : Invoke CN kernel menuconfig
bgp-cn-linux-build            : Rebuild CN kernel objs and CN ramdisk, then merge them into CN Kernel
bgp-cn-linux-try-patching     : Try to apply new patch to CN kernel
bgp-cn-linux-update-prebuilt  : Copy CN kernel objs from tmp dir to prebuilt dir (no build)
bgp-cn-linux-add-ktau         : Patch CN kernel with KTAU and link in KTAU headers & sources
bgp-cn-linux-del-ktau         : Remove KTAU patch and links from CN kernel

NOTE:
- The following targets copy prebuilt or newly built kernel image to the top dir
  bgp-ion-linux, bgp-ion-linux-build, bgp-cn-linux, bgp-cn-linux-build and bgp-cn-linux-ramdisk

- 'make GIT=1 ...' clone repo from public git repo via http and use them as target source tree
- By default, kernel tarballs are extracted and use them as target source tree

Issuing 'make bgp-cn-linux-add-ktau' patches the ZeptoOS compute node Linux kernel with the configured KTAU patch and then links to headers and sources from the KTAU path.

[email protected]:~/BGP/kernel> make bgp-cn-linux-add-ktau 
[ -d work ] || mkdir work
tar xfj tarball/linux-2.6.19.2-BGP-V1R3.tar.bz2 -C work
if [ -f tarball/linux-2.6.19.2-BGP-V1R3.patch ] ; then \
	( cd work/linux-2.6.19.2-BGP-V1R3 ; patch -p1 < ../../tarball/linux-2.6.19.2-BGP-V1R3.patch ) ; \
	cp tarball/linux-2.6.19.2-BGP-V1R3.patch work/linux-2.6.19.2-BGP-V1R3.patch.applied ; \
fi
patching file Makefile
patching file arch/ppc/Kconfig
patching file arch/ppc/kernel/head_44x.S
patching file arch/ppc/kernel/vmlinux.lds.S
patching file arch/ppc/mm/zepto_flatmem.c
patching file arch/ppc/syslib/bgdd/bluegene_console.c
patching file arch/ppc/syslib/bgdd/bluegene_dma.c
patching file bgp-cn-2.6.19.2-dot-config
patching file bgp-ion-flatmem-2.6.19.2-dot-config
patching file fs/Kconfig
patching file include/asm-ppc/page.h
patching file include/asm-ppc/pgtable.h
patching file include/linux/mmzone.h
patching file kernel/exit.c
touch work/.cn_kernel_prepared
Linking to KTAU headers...
Linking to KTAU sources...
Linking to KTAU make...
Applying KTAU patch...
patching file arch/powerpc/kernel/irq.c
patching file arch/powerpc/oprofile/Kconfig
patching file arch/ppc/kernel/entry.S
Hunk #4 succeeded at 306 (offset -6 lines).
patching file arch/ppc/kernel/smp.c
patching file arch/ppc/kernel/time.c
patching file arch/ppc/kernel/traps.c
Hunk #1 succeeded at 69 (offset -7 lines).
patching file arch/ppc/syslib/bgdd/bluegene_network.c
Hunk #2 succeeded at 496 (offset -34 lines).
Hunk #3 succeeded at 513 (offset -34 lines).
patching file include/linux/sched.h
patching file init/main.c
patching file kernel/fork.c
patching file kernel/irq/handle.c
patching file kernel/Makefile
patching file kernel/sched.c
patching file kernel/softirq.c
patching file kernel/timer.c
patching file kernel/workqueue.c
patching file Makefile
NOTE: Integration of KTAU into ZeptoOS is complete.
 Next, please configure KTAU kernel options using "make bgp-cn-linux-menuconfig".
 After that, please follow the standard ZeptoOS build steps (e.g. "make bgp-cn-linux-build").

Configuring KTAU parameters of the patched ZeptoOS CN Linux

Next the KTAU parameters of the kernel configuration need to be set. This can be done by issuing a 'make bgp-cn-linux-menuconfig'.

[email protected]:~/BGP/kernel> make bgp-cn-linux-menuconfig 

KTAU menuconfig.jpg


Building ZeptoOS with KTAU

After configuration is performed and saved, the kernel needs to be built.

[email protected]:~/BGP/kernel> make bgp-cn-linux-build

Do not forget to update the prebuilt binaries with the kernel image you just built.

[email protected]:~/BGP/kernel> make bgp-cn-linux-update-prebuilt 
Copying kernel objs from tmp/cnkernelobjs to prebuilt/2.6.19/objs/ ...
done
CN kernel prebuilt objs and kernel image has been updated

Lastly, go back up to the root-level ZeptoOS directory and perform a 'make' to set / update / build any other targets.

[email protected]:~/BGP/kernel> cd ..

[email protected]:~/BGP> pwd
/home/anataraj/BGP

[email protected]:~/BGP> make 
Create root password for I/O Node
   Leave the password field empty if you want to disable root login
   New password:
   Retype new password:
<snipped out>
Zepto Kernel and ramdisk images are ready:
BGP-CN-zImage-with-initrd.elf
BGP-ION-ramdisk.elf
BGP-ION-ramdisk-for-CNL.elf
BGP-ION-zImage.elf

You should now have a KTAU-patched and configured ZeptoOS Compute Node Linux kernel ready for use.


Building KTAU User-level Utilities

Next please build the user-level utils of KTAU, as follows:

[email protected]:~/ktau-1.7.15-bgpzepto> pwd
/home/anataraj/ktau-1.7.15-bgpzepto

[email protected]:~/ktau-1.7.15-bgpzepto> cd user-src/
[email protected]:~/ktau-1.7.15-bgpzepto/user-src> cd src/

[email protected]:~/ktau-1.7.15-bgpzepto/user-src/src> make \
                          ARCH=ppc \
                          BGP_KERNEL_INC=/home/anataraj/ZeptoReleases/BGP/kernel/work/linux-2.6.19.2-BGP-V1R3/include/ \
                          clean all

[email protected]:~/ktau-1.7.15-bgpzepto/user-src/src> cd runktau/
[email protected]:~/ktau-1.7.15-bgpzepto/user-src/src/runktau> ls -alhtr timeKtau
-rwxr-xr-x 1 anataraj users 6.0M 2009-04-28 13:09 timeKtau

[email protected]:~/ktau-1.7.15-bgpzepto/user-src/src> cd ..
[email protected]:~/ktau-1.7.15-bgpzepto/user-src> ls lib/
libktau.a

Removing KTAU from ZeptoOS

If you wish to revert back to the original version of ZeptoOS without KTAU integration, please perform the below steps.

[email protected]:~/BGP> pwd
/home/anataraj/BGP

[email protected]:~/BGP> cd kernel/

[email protected]:~/BGP/kernel> make bgp-cn-linux-del-ktau
Reversing the KTAU patch (using a local copy of the applied patch)...
patching file arch/powerpc/kernel/irq.c
patching file arch/powerpc/oprofile/Kconfig
patching file arch/ppc/kernel/entry.S
Hunk #4 succeeded at 287 (offset -6 lines).
patching file arch/ppc/kernel/smp.c
patching file arch/ppc/kernel/time.c
patching file arch/ppc/kernel/traps.c
Hunk #1 succeeded at 69 (offset -7 lines).
patching file arch/ppc/syslib/bgdd/bluegene_network.c
Hunk #2 succeeded at 493 (offset -34 lines).
Hunk #3 succeeded at 505 (offset -34 lines).
patching file include/linux/sched.h
patching file init/main.c
patching file kernel/fork.c
patching file kernel/irq/handle.c
patching file kernel/Makefile
patching file kernel/sched.c
patching file kernel/softirq.c
patching file kernel/timer.c
patching file kernel/workqueue.c
patching file Makefile
NOTE: KTAU integration has been undone (patch reversed and all links removed).
 Next, please reconfigure and build the ZeptoOS CN Linux following standard steps.
 You can "make bgp-cn-linux-menuconfig" and "make bgp-cn-linux-build".

Please remember to reconfigure and rebuild ZeptoOS after undoing the KTAU integration.







Using TAU with KTAU : Integrated Profiles

TAU can be used in a tightly coupled fashion with KTAU to provide integrated profiles that show both user and kernel performance data under certain configurations. The following subsection describes such a configuration.


Configuring under ZeptoOS

Use all the previously described configuration options, but in addition add the -MULTIPLECOUNTERS and -ktau* options to the configure line. As follows:

./configure  -arch=ppc -zeptodir=/home/anataraj/ZeptoReleases/install-BGP \
             -mpi \
             -pdt=/home/anataraj/pdtoolkit-3.14 -pdtarchdir=ppc64 -pdt_c++=xlC \
             -MULTIPLECOUNTERS \
             -ktau -ktau_shctr \
             -ktauinc=/home/anataraj/BGP/kernel/work/linux-2.6.19.2-BGP-V1R3/include \
             -ktauincuser=/home/anataraj/ktau-1.7.15-bgpzepto/user-src/include \
             -ktaulib=/home/anataraj/ktau-1.7.15-bgpzepto/user-src/lib/ \
             -ktausym=/proc/kallsyms

The ktau* options tell TAU where to find: the patched kernel includes, the user-level KTAU utils includes, the user-level KTAU libs and the kernel symbol table (at application runtime) respectively.

Then build as usual:

make clean install

This will provide a TAU that is capable of profiling MPI applications on ZeptoOS/BGP while also providing kernel-level performance information. It is assumed, of course, that ZeptoOS has also been configured and built with KTAU.

Example Run

TAU has several example codes listed in the "tau-<version>/examples" directory. Depending on the configuration of TAU, not all of those examples can be run.

Building the 'ring' Example Code

If PDT and MPI have been included in the configuration, then the sample codes in "tau-<version>/examples/pdt_mpi/" directory can be run as follows:

[email protected]:~/tau-2.18.1p2> cd examples/pdt_mpi/c

[email protected]:~/tau-2.18.1p2/examples/pdt_mpi/c> ls
Makefile  ring.c

[email protected]:~/tau-2.18.1p2/examples/pdt_mpi/c> make

[email protected]:~/tau-2.18.1p2/examples/pdt_mpi/c> ls
Makefile  ring  ring.c  ring.inst.c  ring.o

Enabling ZCB-flag in 'ring'

Once 'ring' is built it needs to be turned into a zepto-compute-binary (zcb) by turning a flag on in the elf-header of the binary. This is done using the 'zelftool' which is part of the ZeptoOS installation.

[email protected]:~/tau-2.18.1p2/examples/pdt_mpi/c> /home/anataraj/ZeptoReleases/install-BGP/bin/zelftool ./ring
zcb is off
[email protected]:~/tau-2.18.1p2/examples/pdt_mpi/c> /home/anataraj/ZeptoReleases/install-BGP/bin/zelftool -e ./ring
[email protected]:~/tau-2.18.1p2/examples/pdt_mpi/c> /home/anataraj/ZeptoReleases/install-BGP/bin/zelftool ./ring
zcb is on

Running 'ring' under a ZeptoOS kernel profile

Now this binary can be run on ZeptoOS/BGP by specifying the correct kernel-profile to cqsub. In the below example, the '-k anataraj' profile points to a ZeptoOS kernel patched with KTAU.

[email protected]:~/tau-2.18.1p2/examples/pdt_mpi/c> cqsub -p ZeptoOS -k anataraj -t 10 -n 64 ./ring

Resulting Profile Data

After the run, user and kernel profiles will be created for each MPI rank. These (user-readable ascii) profiles can be read through the Paraprof Analysis and Visualization tool in TAU.

User-level Profile

The user profiles are named "profile.<rank>.0.0". In this example run there are 64 ranks.

[email protected]:~/tau-2.18.1p2/examples/pdt_mpi/c> ls profile.*
profile.0.0.0   profile.16.0.0  profile.23.0.0  profile.30.0.0  profile.38.0.0  profile.45.0.0  profile.52.0.0  profile.6.0.0
profile.1.0.0   profile.17.0.0  profile.24.0.0  profile.31.0.0  profile.39.0.0  profile.46.0.0  profile.53.0.0  profile.60.0.0
profile.10.0.0  profile.18.0.0  profile.25.0.0  profile.32.0.0  profile.4.0.0   profile.47.0.0  profile.54.0.0  profile.61.0.0
profile.11.0.0  profile.19.0.0  profile.26.0.0  profile.33.0.0  profile.40.0.0  profile.48.0.0  profile.55.0.0  profile.62.0.0
profile.12.0.0  profile.2.0.0   profile.27.0.0  profile.34.0.0  profile.41.0.0  profile.49.0.0  profile.56.0.0  profile.63.0.0
profile.13.0.0  profile.20.0.0  profile.28.0.0  profile.35.0.0  profile.42.0.0  profile.5.0.0   profile.57.0.0  profile.7.0.0
profile.14.0.0  profile.21.0.0  profile.29.0.0  profile.36.0.0  profile.43.0.0  profile.50.0.0  profile.58.0.0  profile.8.0.0
profile.15.0.0  profile.22.0.0  profile.3.0.0   profile.37.0.0  profile.44.0.0  profile.51.0.0  profile.59.0.0  profile.9.0.0

Paraprof screen-shot of user-level performance data from all 'ring' ranks.

Kernel-level Profile

The kernel profiles for all processes corresponding to each node are under the respective "Kprofile.<rank>.0.perprocess" directories and there are 64 of those as well in this example. The files in these directories are numbered by their process-ids.

[email protected]:~/tau-2.18.1p2/examples/pdt_mpi/c> ls Kprofile.0.0.perprocess/
profile.1.0.0    profile.12.0.0  profile.17.0.0  profile.21.0.0  profile.26.0.0  profile.30.0.0  profile.44.0.0  profile.55.0.0  profile.9.0.0
profile.10.0.0   profile.13.0.0  profile.18.0.0  profile.22.0.0  profile.27.0.0  profile.31.0.0  profile.48.0.0  profile.6.0.0
profile.11.0.0   profile.14.0.0  profile.19.0.0  profile.23.0.0  profile.28.0.0  profile.32.0.0  profile.5.0.0   profile.62.0.0
profile.115.0.0  profile.15.0.0  profile.2.0.0   profile.24.0.0  profile.29.0.0  profile.34.0.0  profile.53.0.0  profile.7.0.0
profile.116.0.1  profile.16.0.0  profile.20.0.0  profile.25.0.0  profile.3.0.0   profile.4.0.0   profile.54.0.0  profile.8.0.0

Paraprof screen-shot of kernel-level performance data from 'ring' rank 0.

Integrated Profile

One method to obtain an integrated profile is to specify which kernel activities need to be closely tracked within the context of user-level application routines. The integration of TAU and KTAU allows specification of those activities through TAU's multiple-counter interface.

For instance, if close tracking of the "timer_interrupt" and "schedule" kernel operations are required then the following cqsub command may be issued:

[email protected]:~/tau-2.18.1p2/examples/pdt_mpi/c> cqsub -p ZeptoOS -k anataraj -t 10 -n 64 \
                -e COUNTER1=GET_TIME_OF_DAY:COUNTER2=KTAU_INCL_timer_interrupt:COUNTER3=KTAU_schedule \
                ./ring

The above command uses environment variables to tell TAU to track the two specific kernel operations in addition to standard elapsed time (gettimeofday). Any instrumented kernel symbol can be used with 'KTAU_' prefixed.

The resulting profile data is placed in "MULTI__*" directories, one for each specified counter.

[email protected]:~/tau-2.18.1p2/examples/pdt_mpi/c> ls | grep MULTI__
MULTI__GET_TIME_OF_DAY
MULTI__INCL_timer_interrupt
MULTI__schedule

[email protected]:~/tau-2.18.1p2/examples/pdt_mpi/c> ls MULTI__INCL_timer_interrupt/
profile.0.0.0   profile.16.0.0  profile.23.0.0  profile.30.0.0  profile.38.0.0  profile.45.0.0  profile.52.0.0  profile.6.0.0
profile.1.0.0   profile.17.0.0  profile.24.0.0  profile.31.0.0  profile.39.0.0  profile.46.0.0  profile.53.0.0  profile.60.0.0
profile.10.0.0  profile.18.0.0  profile.25.0.0  profile.32.0.0  profile.4.0.0   profile.47.0.0  profile.54.0.0  profile.61.0.0
profile.11.0.0  profile.19.0.0  profile.26.0.0  profile.33.0.0  profile.40.0.0  profile.48.0.0  profile.55.0.0  profile.62.0.0
profile.12.0.0  profile.2.0.0   profile.27.0.0  profile.34.0.0  profile.41.0.0  profile.49.0.0  profile.56.0.0  profile.63.0.0
profile.13.0.0  profile.20.0.0  profile.28.0.0  profile.35.0.0  profile.42.0.0  profile.5.0.0   profile.57.0.0  profile.7.0.0
profile.14.0.0  profile.21.0.0  profile.29.0.0  profile.36.0.0  profile.43.0.0  profile.50.0.0  profile.58.0.0  profile.8.0.0
profile.15.0.0  profile.22.0.0  profile.3.0.0   profile.37.0.0  profile.44.0.0  profile.51.0.0  profile.59.0.0  profile.9.0.0

Paraprof screen-shot of timer_interrupt activity observed from within the user-level routines of the 64 ranks of the 'ring' application.Paraprof screen-shot of timer_interrupt activity observed from within the user-level routines of Rank-0 of the 'ring' application.


Top