Launching the TotalView Debugger
TotalView is a tool built to debug parallel programs. This includes multithreaded, multiprocess, and GPU-based applications. TotalView can also be used for memory debugging and also provides a technology called ReplayEngine which allows users to step both forward and backward through sections of their program (at the expense of memory consumption).
TotalView currently has two interfaces: classic and modern. Some of the more advanced features may only be available via the classic interface as features are still being ported to the modern interface. The interface can be changed under File > Preferences > Display. That said, this guide will reference the modern interface.
This how-to guide will explain how to launch serial and parallel jobs for debugging with TotalView. It assumes users are already familiar with using graphical debuggers and will not be teaching the TotalView interfaces. Instead, users should view the links in the TotalView Resources section for more information on using TotalView. For advanced features like memory debugging and ReplayEngine, users are encouraged to search TotalView's resources.
Accessing TotalView
The best way to run TotalView on the HPCC is to use the OnDemand Interactive Desktop app. This provides access to the GUI with the least latency.
The number of and amount of memory you should request depends on the type of job you want to debug. For a serial job, you'll only need one core. For parallel jobs, it's advisable to use as few cores as possible. Though the Interactive Desktop and TotalView GUI should not be too draining, these cores will be switching the Interactive Desktop, TotalView, and your program.
Note
While TotalView supports remote debugging—that is, using a copy of TotalView running on your personal computer to debug a job running on the HPCC—downloading the client requires a license number.
Once you've accessed your Interactive Desktop instance, use the Menu in the upper left corner to navigate to System Tools > XTerm. From the terminal, run
module load TotalView
TotalView is now accessible from the command line.
Type totalview
in the command line to launch the application.
A window will pop up with several panels. In the center should be the Start Page
panel.
Tip
While you can specify a working directory within the Session Editor dialog,
it may be easier to cd
to your desired working directory from the terminal
before launching TotalView.
Debugging Serial Jobs
From the Start Page, select "Debug a Program" to bring up the Session Editor dialog. You can enter a Session Name to easily re-launch this debugging session later. At minimum you must supply the File Name of the executable to be debugged. You can also specify any arguments, environment variables, or input/output redirects in the corresponding boxes.
Select Load Session once you have finished supplying all necessary information for launching your program. Your program will not start running until you select the green "run" button in the upper left.
Debugging Parallel Jobs
From the Start Page, select "Debug a Parallel Program" to bring up the Session Editor dialog. You can enter a Session Name to easily re-launch this debugging session later. At minimum you must supply the File Name of the executable to be debugged. You can also specify any arguments, environment variables, or input/output redirects in the corresponding boxes.
When filling out the Parallel Details section, it is important that you follow the instruction below based on the MPI library that you used to compile your program. Do not select SLURM as your Parallel System, as SLURM thinks all of the cores you requested when setting up the Interactive Desktop are currently in use. If you use SLURM as the Parallel System, your application will fail because of unavailable resources.
Select "Open MPI" as your Parallel system.
Specify the number of tasks (processes) to use.
In the Additional Starter Arguments box, put --oversubscribe
.
Select "Intel MPI-Hydra" as your Parallel System. Specify the number of tasks (processes) to use.
Select Load Session once you have finished supplying all necessary information for launching your program. Your program will not start running until you select the green "run" button in the upper left.
Debugging Python Jobs
TotalView supports stack filtering for Python executables to reduce clutter among the visible stack frames and focus on the user's executable itself.
You'll want to provide the path to your Python executable as the File Name
and your Python script as an argument. Make sure you are accessing your desired Python environment
and run which python
on the command line to see it's path.
TotalView Resources
ICER offers version 2023.4 of TotalView. There is separate documentation for the modern and classic interfaces. A number of cheatsheets can also be found on the main documentation page though be advised that these may reflect newer versions of TotalView.
TotalView also has a number of resources available including videos, blog posts, and webinars demonstrating how to use TotalView for various tasks.