Gábor Samu
Gábor Samu
Creator of this blog.
Sep 4, 2020 3 min read

Extending the Spectrum LSF GUI to display job GPU metrics

thumbnail for this post

I’ve previously written about accounting for GPU workloads in Spectrum LSF using Nvidia DCGM to collect granular metrics including energy consumed, memory used, and overall GPU utilization. Spectrum LSF collects the information and it is made available through the familiar bhist and bacct commands.

How can one go about displaying this information in the web-based job management interface that is provided by Spectrum LSF Application Center or as part of the Spectrum LSF Suites? Here we will provide a simple example showing how:

  • Administrators can customize the navigation in the Spectrum LSF web-based job management interface
  • Display the same GPU accounting information in the Spectrum LSF web-based job management interface

The following assumes that DCGM support has been enabled in Spectrum LSF and that you are running an edition of the Spectrum LSF Suite or Spectrum LSF Application Center

The Spectrum LSF web-based job management interface enables GUI administrators to create new tabs with a user specified URL or command. Here we will create a new tab which runs a command (script) which will run the Spectrum LSF bhist command to display the GPU metrics for a given job. The script must be able to distinguish between a GPU and non-GPU job.

A. To begin, we’ll require a simple script to display the detailed historical data of a given jobID, including GPU metrics using the Spectrum LSF bhist command. An example simple script is provided below which is saved with filename gpu_acct.sh.

#!/bin/sh
if [ -z "$1" ]
then
   echo "Usage $0 <jobID>"
else
OUTPUT=`bhist -a -l -gpu $1`
grep -q 'GPU Energy Consumed' <<< $OUTPUT && bhist -a -l -gpu $1 || echo "Not a GPU job."
fi

As the Spectrum LSF administrator, create the above script in the $LSF_BINDIR directory with permissions 755.

B. Next, login to the Spectrum LSF web-based interface as a user with administrative privileges and navigate to Workload > Workload. Note that the user must have the Application Center Administrator privilege.

C. It’s now necessary to select one of the jobs in the job list in order to display the job detail view. This is the page where we will be adding the GPU accounting tab.

D. Click the edit (pencil) dropdown that can be found at the top right of the Spectrum LSF web-based interface and select Edit Page.

This will display the Create New Tab window which will be filled in during the next step.

E. In the Create New Tab window, specify the following:

  • Tab Label: GPU accounting
  • Content From: Command and specify the command gpu_acct.sh %J

Click the Apply button to complete the addition of the new tab on the job detail page.

F. Finally, click the Edit Page dropdown on the top right corner of the interface and select Apply and exit Pages Editing to make the changes take effect. You will now see a new GPU accounting tab in the job detail view. Here I’ve selected a GPU job that has been run previously through Spectrum LSF. We see the full bhist output displayed here including the detailed GPU accounting.

As a final note, for jobs that have not requested a GPU resource through Spectrum LSF, we will see the message “Not a GPU job" displayed when the GPU accounting tab is selected.

That concludes this simple example showing how the Spectrum LSF web-based interface can be customized.