EZTrace
 

Analyzing an MPI application with EZTrace

Installing EZTrace

First, configure EZTrace as described in the support page.
$ ./configure --with-mpi-include=<PATH_TO_MPI.H> --prefix=<INSTALLATION_DIR>

Check that MPI was detected by configure:
$ grep -r ac_cv_header_mpi_h config.log
ac_cv_header_mpi_h=yes

If everything is OK, compile and install EZTrace:
$ make &&make install

Check the installation of EZTrace:
$ eztrace_avail
3       stdio   Module for stdio functions (read, write, select, poll, etc.)
2       pthread Module for PThread synchronization functions (mutex, semaphore, spinlock, etc.)
1       omp     Module for OpenMP parallel regions
4       mpi     Module for MPI functions
5       memory  Module for memory functions (malloc, free, etc.)

Running an application with EZTrace

Let's analyze this MPI stencil application. First, compile it:
$ tar xzvf stencil_mpi.tgz
[...]
$ mpicc -o stencil_mpi stencil_mpi.c

You can run the application with:
$ mpirun -np 4 ./stencil_mpi 1024

To analyze it with EZTrace, simply add eztrace -t mpi to the command line:
$ mpirun -np 4 eztrace -t mpi ./stencil_mpi 1024

Each MPI process will generate a trace file in
/tmp/<username>_eztrace_log_rank_<MPI_rank>.
If you run the program on several nodes, you can retrieve the trace files using scp.

In order to avoid retrieving the trace files after the execution, you can set EZTRACE_TRACE_DIR to a directory that is shared among the nodes of your cluster. For instance:

$ export EZTRACE_TRACE_DIR=$HOME/traces
$ mpirun -np 4 eztrace ./stencil_mpi 1024
[...]
Stopping EZTrace... saving trace  /home/${USER}/traces/${USER}_eztrace_log_rank_0
Stopping EZTrace... saving trace  /home/${USER}/traces/${USER}_eztrace_log_rank_2
Stopping EZTrace... saving trace  /home/${USER}/traces/${USER}_eztrace_log_rank_1
Stopping EZTrace... saving trace  /home/${USER}/traces/${USER}_eztrace_log_rank_3

Analyzing the execution of the application

Once the execution traces are generated, we need to analyze them. EZTrace allows to:
Visualizing an execution trace

In order to generate a PAJE/OTF trace file that can be visualized with ViTE or Vampir, you need to run eztrace_convert:

$ eztrace_convert -t PAJE $HOME/traces/${USER}_eztrace_log_rank_*
module stdio loaded
module pthread loaded
module omp loaded
module mpi loaded
module memory loaded
5 modules loaded
no more block for trace #2
no more block for trace #3
no more block for trace #1
no more block for trace #0
1570 events handled

This generates a file named paje.trace that you can visualize with ViTE:
$ vite paje.trace


Computing statistics
EZTrace can compute statistics on MPI messages:
$ eztrace_stats $HOME/traces/${USER}_eztrace_log_rank_*

MPI:
---
        MPI_RECV             :300 calls
        MPI_ISEND            :300 calls
        MPI_WAIT             :300 calls
        MPI_BARRIER          :8 calls
CT_Process #0 --        50 messages sent
        Size of messages (byte):        min: 8192       max: 8192       average: 8192   total: 409600
        Time spent sending messages (ms):       min: 25.654522  max: 37.878700  average: 33.762951      total: 1688.147534
        Time spent computing while sending messages (ms):       min: 25.651473  max: 37.876794  average: 33.760435      total: 1688.021762
        Time spent in MPI_Send or waiting for a Isend to complete (ms): min: 0.001450   max: 0.005251   average: 0.002515       total: 0.125772
        Time spent receiving messages (ms):     min: 0.010901   max: 11.508515  average: 3.747645       total: 187.382259
        Time spent computing while receiving messages (ms):     min: 0.000000   max: 0.000000   average: 0.000000       total: 0.000000
        Time spent in MPI_Recv or waiting for a Irecv to complete (ms): min: 0.010901   max: 11.508515  average: 3.747645       total: 187.382259
CT_Process #1 --        100 messages sent
        Size of messages (byte):        min: 8192       max: 8192       average: 8192   total: 819200
        Time spent sending messages (ms):       min: 0.014730   max: 42.560793  average: 18.752866      total: 1875.286636
        Time spent computing while sending messages (ms):       min: 0.013260   max: 42.557880  average: 18.750474      total: 1875.047353
        Time spent in MPI_Send or waiting for a Isend to complete (ms): min: 0.001031   max: 0.008158   average: 0.002393       total: 0.239283
        Time spent receiving messages (ms):     min: 0.004429   max: 11.215598  average: 1.219087       total: 121.908687
        Time spent computing while receiving messages (ms):     min: 0.000000   max: 0.000000   average: 0.000000       total: 0.000000
        Time spent in MPI_Recv or waiting for a Irecv to complete (ms): min: 0.004429   max: 11.215598  average: 1.219087       total: 121.908687
CT_Process #2 --        100 messages sent
        Size of messages (byte):        min: 8192       max: 8192       average: 8192   total: 819200
        Time spent sending messages (ms):       min: 0.018938   max: 42.488664  average: 18.103344      total: 1810.334430
        Time spent computing while sending messages (ms):       min: 0.018449   max: 42.486583  average: 18.101278      total: 1810.127777
        Time spent in MPI_Send or waiting for a Isend to complete (ms): min: 0.000315   max: 0.036811   average: 0.002067       total: 0.206653
        Time spent receiving messages (ms):     min: 0.005040   max: 14.437164  average: 2.907976       total: 290.797554
        Time spent computing while receiving messages (ms):     min: 0.000000   max: 0.000000   average: 0.000000       total: 0.000000
        Time spent in MPI_Recv or waiting for a Irecv to complete (ms): min: 0.005040   max: 14.437164  average: 2.907976       total: 290.797554
CT_Process #3 --        50 messages sent
        Size of messages (byte):        min: 8192       max: 8192       average: 8192   total: 409600
        Time spent sending messages (ms):       min: 0.018112   max: 14.455367  average: 5.824193       total: 291.209661
        Time spent computing while sending messages (ms):       min: 0.017621   max: 14.454577  average: 5.823531       total: 291.176530
        Time spent in MPI_Send or waiting for a Isend to complete (ms): min: 0.000404   max: 0.001063   average: 0.000663       total: 0.033131
        Time spent receiving messages (ms):     min: 0.009640   max: 14.142448  average: 4.225155       total: 211.257759
        Time spent computing while receiving messages (ms):     min: 0.000000   max: 0.000000   average: 0.000000       total: 0.000000
        Time spent in MPI_Recv or waiting for a Irecv to complete (ms): min: 0.009640   max: 14.142448  average: 4.225155       total: 211.257759

        /tmp/eztrace_stats_${USER}/message_size.gp was created


eztrace_stats prints:
Dumping the messages
Additionnaly, you can get the list of all messages that were exchanged during the execution by setting EZTRACE_MPI_DUMP_MESSAGES:
$ export EZTRACE_MPI_DUMP_MESSAGES=1
$ eztrace_stats $HOME/traces/${USER}_eztrace_log_rank_*
[...]
MPI messages dumped in /tmp/eztrace_stats_${USER}/${USER}_eztrace_message_dump
[...]
$ cat /tmp/eztrace_stats_${USER}/${USER}_eztrace_message_dump
# src   dest    len     tag     Isend_tick      SWait_tick      Wait_done_tick  Irecv_tick      RWait_tick      Wait_done_tick  Sender_request  Receiver_request
1       0       8192    0       1685106493      1712167873      1712170464      1687482772      1687482772      1687497702      0x890960        (nil)
1       0       8192    0       1652218487      1685066389      1685068287      1657773287      1657773287      1657788052      0x890920        (nil)
1       0       8192    0       1619270156      1652179190      1652180986      1628295337      1628295337      1628310886      0x8908e0        (nil)
1       0       8192    0       1586632048      1619230569      1619232188      1598695549      1598695549      1598710239      0x8908a0        (nil)
1       0       8192    0       1548711295      1586588089      1586589995      1568940582      1568940582      1568955344      0x890860        (nil)
[...]

This file contains the list of messages that were exchanged during the execution of the application. For each message, you have access to:


This tutorial was designed for EZTrace 0.9 or higher. If you encounter any problem, feel free to contact EZTrace developpers.