NVIDIA DCGM

NVIDIA DCGM

Canonical Publisher Verified account Verified account

Install core24/stable of NVIDIA DCGM

Ubuntu 16.04 or later?

Make sure snap support is enabled in your Desktop store.


Install using the command line

sudo snap install dcgm

Don't have snapd? Get set up for snaps.

Channel Version Published

Snap for NVIDIA DCGM and DCGM-Exporter

This snap includes NVIDIA DCGM and DCGM-Exporter to manage and monitor NVIDIA GPUs via the CLI or via Prometheus metrics. Grafana dashboards can then be used to visualize the exported metrics, see for example:
https://grafana.com/grafana/dashboards/12239-nvidia-dcgm-exporter-dashboard/

The snap includes the following components:

  • DCGM: Data Center GPU Manager
  • DCGM-Exporter: a Prometheus exporter for DCGM metrics

Please see the links at the bottom of the page for more details about the included components and their purpose.

How-To


How to install the snap:

sudo snap install dcgm

How to enable metrics collection:

# Start the DCGM-Exporter service (disabled by default)
sudo snap start dcgm.dcgm-exporter

# Get the metrics
curl -s localhost:9400/metrics

How to configure the snap services:

The NV-Hostengine and DCGM-Exporter services can be configured via the snap CLI.
For example:

# Get all the configuration options
sudo snap get dcgm

# Set the NV-Hostengine port
sudo snap set dcgm nv-hostengine-port=5577

# Restart the NV-Hostengine service to apply the changes
sudo snap restart dcgm.nv-hostengine

Reference


Available configurations options:

  • nv-hostengine-port: the port on which the NV-Hostengine listens. The default is 5555.
  • dcgm-exporter-address: the address DCGM-Exporter binds to. The default is :9400.
  • dcgm-exporter-metrics-file: the name of a custom CSV metrics file to be loaded by the exporter. The path is assumed to be /var/snap/dcgm/common/. The default metrics are located in /snap/dcgm/current/etc/dcgm-exporter/default-counters.csv. Please refer to the DCGM-Exporter repository link at the bottom of the page for more information on the CSV file format.

Cryptography


During the snap build process, snapcraft downloads the CUDA keyring deb package using curl over HTTPS and verifies its integrity using SHA256 checksums. The CUDA keyring deb package is then used to set up the appropriate source for the DCGM deb package, whose signature is verified using the keyring.
For more information, see the CUDA keyring repository link and curl documentation at the bottom of the page.

Links


Upstream DCGM-Exporter repository
https://github.com/NVIDIA/dcgm-exporter

Upstream DCGM repository
https://github.com/NVIDIA/DCGM

DCGM Documentation
https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/index.html

Available NVIDIA GPU metrics
https://docs.nvidia.com/datacenter/dcgm/latest/dcgm-api/dcgm-api-field-ids.html

Repository for the CUDA keyring and DCGM deb package
https://developer.download.nvidia.com/compute/cuda/repos/

curl Documentation
https://curl.se/docs/manpage.html

Details for NVIDIA DCGM

License
  • Apache-2.0

Last updated
  • 6 December 2024 - core24/stable
  • 11 December 2024 - latest/edge

Contact

Source code

Report a bug

Report a Snap Store violation

Share this snap

Generate an embeddable card to be shared on external websites.


Install NVIDIA DCGM on your Linux distribution

Choose your Linux distribution to get detailed installation instructions. If yours is not shown, get more details on the installing snapd documentation.


Where people are using NVIDIA DCGM

Users by distribution (log)

Ubuntu 24.04
Ubuntu 22.04
Ubuntu 24.10
Ubuntu 20.04
Ubuntu 18.04
Zorin OS 17