Skip to content

feat(thor): GPU VRAM display and per-process #849

Draft
tokk-nv wants to merge 5 commits intorbonghi:masterfrom
tokk-nv:feature/thor-r38-gpu-memory
Draft

feat(thor): GPU VRAM display and per-process #849
tokk-nv wants to merge 5 commits intorbonghi:masterfrom
tokk-nv:feature/thor-r38-gpu-memory

Conversation

@tokk-nv
Copy link
Copy Markdown

@tokk-nv tokk-nv commented Apr 22, 2026

Problem

On Jetson Thor running L4T r38.x (unified nvidia.ko driver), jtop had no
visibility into GPU memory:

  • 2GPU tab — right half showed only system shared RAM; no VRAM chart
  • 4MEM tabShared: 0k always, even under heavy GPU load
  • Process tableGPU MEM: 0k for every process

Root causes:

  1. NvMapMemUsed is absent from /proc/meminfo on the nvidia.ko stack —
    it only exists on the nvgpu stack (Orin family).
  2. nvidia-smi --query-gpu=memory.used returns [N/A] on some Thor BSPs.
  3. /sys/kernel/debug/nvmap/iovmm/clients either does not exist on Thor or
    shows 0K for every row because CUDA allocations bypass nvmap on this
    driver stack.

The only reliable per-process GPU memory source on Thor is
nvidia-smi --query-compute-apps, which enumerates CUDA processes with real
allocation sizes. This is the same probe used by tegrastats tooling and
verified against a running Ollama instance (gemma3:4b → 3.8 GiB,
nemotron-3-nano:30b → 25.7 GiB).

Changes

jtop/core/thor_gpu.py

  • _nvsmi_gpu_used_mib() — sums nvidia-smi --query-compute-apps=used_memory
    with a 1-second module-level cache to avoid spawning nvidia-smi on every
    UI frame.
  • _gpu_vram_bytes() — wraps the above with caching, returns
    (used_bytes, total_bytes) where total is MemTotal from /proc/meminfo
    (Thor uses unified DRAM so this equals the full GPU-accessible pool).
  • read_gpu_mem_rows_for_gui() — now uses _gpu_vram_bytes() instead of
    hard-coded zeros.

jtop/gui/pgpu_thor.py

  • Adds chart_vram (single green series) to the GPU page.
  • New update_chart_vram() callback feeds the chart from
    read_gpu_mem_rows_for_gui().
  • draw() renders the VRAM chart on the right half when vram_total_b > 0,
    falling back to the shared RAM chart when nvidia-smi is unavailable.

jtop/core/memory.py

  • get_status() now checks for the presence of NvMapMemUsed in
    /proc/meminfo rather than relying on it defaulting to zero.
  • When absent (Thor/nvidia.ko), calls _nvsmi_gpu_used_mib() to populate
    shared with GPU process allocations and sets shared_label = 'VRAM'.
  • Orin/nvgpu behaviour is unchanged.

jtop/core/processes.py

  • Adds _nvsmi_useful() capability probe (same logic as jetson-diagnostic
    scripts): returns True when nvidia-smi --query-gpu=name returns a name
    that does not contain (nvgpu).
  • Adds read_nvsmi_compute_table() — reads
    nvidia-smi --query-compute-apps=pid,process_name,used_memory and returns
    the same (total_kb, rows) shape as read_process_table().
  • ProcessService.get_status() prefers the nvidia-smi path when
    _isNvidiaSmi is True, falls back to nvmap when only _isJetson is True,
    and returns an empty table otherwise. Orin behaviour is unchanged.

jtop/gui/pmem.py

  • draw_ram_legend() reads shared_label from jetson.memory['RAM'] so
    the MEM tab shows VRAM on Thor and Shared on Orin/nvgpu-stack
    devices as before.

Tested on

Board L4T Model VRAM shown
Thor T5000 (jat03) r38.x gemma3:4b via Ollama 3.8 G / 122 G
Thor T5000 (jat03) r38.x nemotron-3-nano:30b via Ollama 25.7 G / 122 G
  • 2GPU tab: green VRAM X/122G chart updates live as models load/unload.
  • 4MEM tab: VRAM: X legend entry (green) replaces Shared: 0k.
  • Process table: GPU MEM column populated correctly per process.
  • JAO162 (Orin AGX): no regression — NvMapMemUsed path unchanged,
    Shared label unchanged, nvmap process table unchanged.

Notes

  • _isNvidiaSmi is evaluated once at ProcessService init. If the GPU
    driver is not ready at jtop service start, a service restart will re-probe.
  • The 1-second nvidia-smi cache in thor_gpu.py is process-scoped. The jtop
    service (root) and GUI client (user) each maintain their own cache; both
    call nvidia-smi independently at most once per second.
  • VRAM "total" is MemTotal from /proc/meminfo. On Thor the GPU and CPU
    share the same physical DRAM pool, so this is the correct denominator.
    The bar therefore shows what fraction of total DRAM is held by GPU compute
    processes.

Summary by Sourcery

Add Thor-specific GPU VRAM and per‑process memory reporting using nvidia-smi while preserving existing Orin/nvgpu behaviour.

New Features:

  • Expose GPU VRAM usage on Thor in the GPU page via a dedicated VRAM chart driven by nvidia-smi compute-app data.
  • Populate the MEM tab on Thor with GPU VRAM usage and a dynamic VRAM label instead of always showing Shared RAM.
  • Report per-process GPU memory usage on Thor via a new nvidia-smi–based process table path.

Enhancements:

  • Introduce a cached VRAM probing helper that aggregates nvidia-smi compute-app memory usage and reuses results across UI frames to reduce overhead.
  • Add a runtime capability probe to select between nvidia-smi and nvmap per-process memory sources based on the active GPU driver stack.

tokk-nv and others added 3 commits April 22, 2026 09:08
On Thor (nvidia.ko driver), NvMapMemUsed is absent from /proc/meminfo
so jtop showed Shared: 0k and no GPU memory chart. This change wires
up nvidia-smi --query-compute-apps as the GPU memory source, which is
the only reliable path on BSPs where device-level memory.used returns
null.

- thor_gpu.py: _nvsmi_gpu_used_mib() sums per-process GPU allocations
  with a 1-second cache; read_gpu_mem_rows_for_gui() uses this for VRAM
- pgpu_thor.py: adds chart_vram (green) to the GPU page; falls back to
  shared RAM chart when nvidia-smi is unavailable
- memory.py: populates shared from nvidia-smi when NvMapMemUsed is
  absent; sets shared_label='VRAM' to distinguish from Orin behaviour
- pmem.py: uses shared_label so MEM tab shows 'VRAM' on Thor and
  'Shared' on Orin/nvgpu-stack devices unchanged

Tested on Thor T5000 (r38.x) with nemotron-3-nano:30b via Ollama:
  2GPU tab: VRAM 25.7G/122G green bar
  4MEM tab: VRAM: 25.7G green legend entry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
  On the nvidia.ko stack (Thor), nvmap iovmm/clients shows 0K for CUDA
  allocations because CUDA bypasses nvmap. Use nvidia-smi --query-compute-apps
  as the per-process GPU memory source on that stack, falling back to
  nvmap on the nvgpu stack (Orin) unchanged.
  Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Apr 22, 2026

Reviewer's Guide

Implements GPU VRAM reporting and per-process GPU memory accounting for Thor (unified nvidia.ko) by sourcing usage from nvidia-smi compute-apps, wiring it into the Thor GPU page, MEM tab, and process table, while preserving existing Orin/nvgpu behavior.

Sequence diagram for GPU VRAM chart update on Thor

sequenceDiagram
    actor User
    participant PGPUThor
    participant ThorGPU as thor_gpu
    participant NvidiaSmi as nvidia_smi

    User->>PGPUThor: open_GPU_tab()
    loop every_UI_frame
        PGPUThor->>PGPUThor: update_chart_vram(jetson, name)
        PGPUThor->>ThorGPU: read_gpu_mem_rows_for_gui(device_index)
        alt cache_hit_in__gpu_vram_bytes
            ThorGPU-->>ThorGPU: _gpu_vram_bytes() returns cached (used_b, total_b)
        else cache_miss_in__gpu_vram_bytes
            ThorGPU->>ThorGPU: _gpu_vram_bytes()
            ThorGPU->>NvidiaSmi: _nvsmi_gpu_used_mib() via nvidia-smi --query-compute-apps
            alt nvidia_smi_success
                NvidiaSmi-->>ThorGPU: list of used_memory values (MiB)
                ThorGPU-->>ThorGPU: sum MiB -> used_b
                ThorGPU-->>ThorGPU: _read_memtotal_bytes() -> total_b
                ThorGPU-->>ThorGPU: store (used_b, total_b) in cache
            else nvidia_smi_failure
                NvidiaSmi-->>ThorGPU: error or empty
                ThorGPU-->>ThorGPU: cache result None
            end
        end
        ThorGPU-->>PGPUThor: {vram_used_b, vram_total_b, shared_used_b, shared_total_b}
        alt vram_total_b > 0
            PGPUThor->>PGPUThor: compute chart scaling with size_min()
            PGPUThor-->>User: render VRAM chart with label VRAM used/total
        else
            PGPUThor->>PGPUThor: fallback to shared RAM chart
            PGPUThor-->>User: render Shared chart
        end
    end
Loading

Sequence diagram for per-process GPU memory source selection

sequenceDiagram
    actor User
    participant ProcessService
    participant NvidiaSmi as nvidia_smi
    participant Nvmap as nvmap_kernel

    User->>ProcessService: start_service()
    activate ProcessService
    ProcessService->>ProcessService: __init__()
    ProcessService->>ProcessService: check os.path.isfile(.../debug/nvmap/iovmm/maps)
    ProcessService-->>ProcessService: set _isJetson flag
    ProcessService->>NvidiaSmi: _nvsmi_useful() via nvidia-smi --query-gpu=name
    alt nvidia_smi_reports_real_name
        NvidiaSmi-->>ProcessService: gpu_name without (nvgpu)
        ProcessService-->>ProcessService: _isNvidiaSmi = True
    else nvidia_smi_unusable
        NvidiaSmi-->>ProcessService: error or (nvgpu)
        ProcessService-->>ProcessService: _isNvidiaSmi = False
    end
    deactivate ProcessService

    loop when_GUI_requests_process_table
        User->>ProcessService: get_status()
        ProcessService->>ProcessService: read /proc/uptime
        alt _isNvidiaSmi is True
            ProcessService->>NvidiaSmi: read_nvsmi_compute_table() via nvidia-smi --query-compute-apps
            alt nvidia_smi_success
                NvidiaSmi-->>ProcessService: (total_kb, raw_rows)
                ProcessService->>ProcessService: get_process_info(pid, gpu_mem_kb, name, uptime) for each row
                ProcessService-->>User: per-process GPU MEM from nvidia-smi
            else nvidia_smi_failure
                NvidiaSmi-->>ProcessService: (0, [])
                ProcessService-->>User: empty GPU MEM table
            end
        else _isJetson is True
            ProcessService->>Nvmap: read_process_table(.../debug/nvmap/iovmm/maps)
            Nvmap-->>ProcessService: (total_kb, raw_rows)
            ProcessService->>ProcessService: get_process_info(pid, gpu_mem_kb, name, uptime) for each row
            ProcessService-->>User: per-process GPU MEM from nvmap
        else neither_path_available
            ProcessService-->>User: empty GPU MEM table
        end
    end
Loading

Updated class diagram for GPU memory collection and UI integration

classDiagram
    class ProcessService {
        - bool _isJetson
        - bool _isNvidiaSmi
        - float _clk_tck
        __init__()
        get_status()
        get_process_info(pid, gpu_mem_usage, process_name, uptime)
    }

    class processes_module {
        +_nvsmi_useful() bool
        +read_nvsmi_compute_table() tuple
        +read_process_table(path) tuple
    }

    class ThorGpuModule {
        +_nvsmi_gpu_used_mib() int
        +_gpu_vram_bytes() tuple
        +read_gpu_mem_rows_for_gui(device_index) dict
        +_read_memtotal_bytes() int
        +_read_memavailable_bytes() int
    }

    class MemoryService {
        +get_status(mem_total) dict
    }

    class PGPUThor {
        +update_chart(jetson, name) dict
        +update_chart_ram(jetson, name) dict
        +update_chart_vram(jetson, name) dict
        +draw(key, mouse)
    }

    class PMemUI {
        +draw_ram_legend(pos_y, pos_x)
    }

    ProcessService --> processes_module : uses
    ProcessService --> ThorGpuModule : imports _nvsmi_gpu_used_mib (indirect via memory)

    processes_module ..> ProcessService : provides helpers
    processes_module ..> ThorGpuModule : shared nvidia-smi semantics

    ThorGpuModule ..> PGPUThor : supplies VRAM and shared RAM rows

    MemoryService --> ThorGpuModule : calls _nvsmi_gpu_used_mib
    PMemUI --> MemoryService : reads shared and shared_label

    PGPUThor --> ThorGpuModule : calls read_gpu_mem_rows_for_gui
    PGPUThor --> PMemUI : consistent VRAM vs Shared labeling
Loading

File-Level Changes

Change Details Files
Add nvidia-smi based per-process GPU memory backend and selection logic in the process service.
  • Introduce _nvsmi_useful() probe to detect when nvidia-smi reports real GPU data (non-nvgpu stacks).
  • Add read_nvsmi_compute_table() to parse nvidia-smi --query-compute-apps output into the existing (total_kb, rows) shape.
  • Extend ProcessService.init to track both nvmap availability (_isJetson) and nvidia-smi usability (_isNvidiaSmi).
  • Update ProcessService.get_status() to prefer nvidia-smi compute-apps data on Thor, fall back to nvmap on Orin, and keep the uptime-based metadata path unchanged.
jtop/core/processes.py
Expose Thor GPU VRAM usage via nvidia-smi compute-apps with caching and feed it into the GPU page RAM/VRAM charts.
  • Add _nvsmi_gpu_used_mib() to sum per-process GPU allocations from nvidia-smi --query-compute-apps=used_memory.
  • Introduce a 1-second cached gpu_vram_bytes() helper that returns (used_bytes, total_bytes) using MemTotal as the unified DRAM pool.
  • Extend read_gpu_mem_rows_for_gui() to populate vram_used_b/vram_total_b instead of hard-coded zeros while keeping shared* fields based on MemTotal/MemAvailable.
jtop/core/thor_gpu.py
jtop/gui/pgpu_thor.py
Make MEM tab shared memory reflect GPU VRAM on Thor and keep shared semantics on Orin, with appropriate labeling.
  • Change get_status() to test for NvMapMemUsed presence and, when absent, fall back to _nvsmi_gpu_used_mib() to populate shared with GPU VRAM usage.
  • Introduce shared_label in the RAM status payload so callers can distinguish VRAM from generic shared memory.
  • Update pmem draw_ram_legend() to render shared_label (VRAM on Thor, Shared on Orin and others) in the MEM legend.
jtop/core/memory.py
jtop/gui/pmem.py
Add a dedicated VRAM chart on the Thor GPU page and use it when VRAM totals are available, falling back to the existing shared RAM chart otherwise.
  • Instantiate a new Chart instance for GPU VRAM alongside the existing shared RAM chart in the Thor GPU page initialization.
  • Implement update_chart_vram() to scale VRAM bytes into chart units using size_min, mirroring update_chart_ram()’s behavior.
  • Update the Thor GPU draw() path to select VRAM chart/labels when vram_total_b > 0, otherwise keep showing the shared RAM chart and labels.
jtop/gui/pgpu_thor.py

Possibly linked issues

  • #GPU NOT DETECTED/AVAILABLE: The PR implements Thor-specific VRAM and per-process GPU memory reporting, directly addressing the missing GPU memory usage in jtop.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

dusty-nv and others added 2 commits April 22, 2026 19:09
…emUsed

On Orin r38 kernels, NvMapMemUsed is absent from /proc/meminfo and
nvidia-smi compute-apps returns [N/A], so the else-branch was leaving
ram_shared=0 and the 4MEM tab showed Shared: 0k.

Restore the pre-branch fallback: if neither source provides data, use
the nvmap process total (mem_total) passed in from processes.get_status(),
matching the original behavior on master.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ce crash

nvpmodel -q takes ~2.5s when spawned via subprocess in a systemd cgroup,
vs ~0.2s from the shell. The old 4s COMMAND_TIMEOUT was too tight and
caused intermittent failures on Orin r38.

Two fixes:
1. Raise COMMAND_TIMEOUT from 4s to 10s for headroom in the service context.
2. Catch JtopException in NVPModelService.__init__: the timeout was being
   re-wrapped as JtopException by nvpmodel_query(), escaping the existing
   (OSError, Command.CommandException) handler and crashing the whole service
   with exit code 0 instead of gracefully disabling nvpmodel.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants