feat(thor): GPU VRAM display and per-process by tokk-nv · Pull Request #849 · rbonghi/jetson_stats

tokk-nv · 2026-04-22T20:26:15Z

Problem

On Jetson Thor running L4T r38.x (unified nvidia.ko driver), jtop had no
visibility into GPU memory:

2GPU tab — right half showed only system shared RAM; no VRAM chart
4MEM tab — Shared: 0k always, even under heavy GPU load
Process table — GPU MEM: 0k for every process

Root causes:

NvMapMemUsed is absent from /proc/meminfo on the nvidia.ko stack —
it only exists on the nvgpu stack (Orin family).
nvidia-smi --query-gpu=memory.used returns [N/A] on some Thor BSPs.
/sys/kernel/debug/nvmap/iovmm/clients either does not exist on Thor or
shows 0K for every row because CUDA allocations bypass nvmap on this
driver stack.

The only reliable per-process GPU memory source on Thor is
nvidia-smi --query-compute-apps, which enumerates CUDA processes with real
allocation sizes. This is the same probe used by tegrastats tooling and
verified against a running Ollama instance (gemma3:4b → 3.8 GiB,
nemotron-3-nano:30b → 25.7 GiB).

Changes

`jtop/core/thor_gpu.py`

_nvsmi_gpu_used_mib() — sums nvidia-smi --query-compute-apps=used_memory
with a 1-second module-level cache to avoid spawning nvidia-smi on every
UI frame.
_gpu_vram_bytes() — wraps the above with caching, returns
(used_bytes, total_bytes) where total is MemTotal from /proc/meminfo
(Thor uses unified DRAM so this equals the full GPU-accessible pool).
read_gpu_mem_rows_for_gui() — now uses _gpu_vram_bytes() instead of
hard-coded zeros.

`jtop/gui/pgpu_thor.py`

Adds chart_vram (single green series) to the GPU page.
New update_chart_vram() callback feeds the chart from
read_gpu_mem_rows_for_gui().
draw() renders the VRAM chart on the right half when vram_total_b > 0,
falling back to the shared RAM chart when nvidia-smi is unavailable.

`jtop/core/memory.py`

get_status() now checks for the presence of NvMapMemUsed in
/proc/meminfo rather than relying on it defaulting to zero.
When absent (Thor/nvidia.ko), calls _nvsmi_gpu_used_mib() to populate
shared with GPU process allocations and sets shared_label = 'VRAM'.
Orin/nvgpu behaviour is unchanged.

`jtop/core/processes.py`

Adds _nvsmi_useful() capability probe (same logic as jetson-diagnostic
scripts): returns True when nvidia-smi --query-gpu=name returns a name
that does not contain (nvgpu).
Adds read_nvsmi_compute_table() — reads
nvidia-smi --query-compute-apps=pid,process_name,used_memory and returns
the same (total_kb, rows) shape as read_process_table().
ProcessService.get_status() prefers the nvidia-smi path when
_isNvidiaSmi is True, falls back to nvmap when only _isJetson is True,
and returns an empty table otherwise. Orin behaviour is unchanged.

`jtop/gui/pmem.py`

draw_ram_legend() reads shared_label from jetson.memory['RAM'] so
the MEM tab shows VRAM on Thor and Shared on Orin/nvgpu-stack
devices as before.

Tested on

Board	L4T	Model	VRAM shown
Thor T5000 (jat03)	r38.x	gemma3:4b via Ollama	3.8 G / 122 G
Thor T5000 (jat03)	r38.x	nemotron-3-nano:30b via Ollama	25.7 G / 122 G

2GPU tab: green VRAM X/122G chart updates live as models load/unload.
4MEM tab: VRAM: X legend entry (green) replaces Shared: 0k.
Process table: GPU MEM column populated correctly per process.
JAO162 (Orin AGX): no regression — NvMapMemUsed path unchanged,
Shared label unchanged, nvmap process table unchanged.

Notes

_isNvidiaSmi is evaluated once at ProcessService init. If the GPU
driver is not ready at jtop service start, a service restart will re-probe.
The 1-second nvidia-smi cache in thor_gpu.py is process-scoped. The jtop
service (root) and GUI client (user) each maintain their own cache; both
call nvidia-smi independently at most once per second.
VRAM "total" is MemTotal from /proc/meminfo. On Thor the GPU and CPU
share the same physical DRAM pool, so this is the correct denominator.
The bar therefore shows what fraction of total DRAM is held by GPU compute
processes.

Summary by Sourcery

Add Thor-specific GPU VRAM and per‑process memory reporting using nvidia-smi while preserving existing Orin/nvgpu behaviour.

New Features:

Expose GPU VRAM usage on Thor in the GPU page via a dedicated VRAM chart driven by nvidia-smi compute-app data.
Populate the MEM tab on Thor with GPU VRAM usage and a dynamic VRAM label instead of always showing Shared RAM.
Report per-process GPU memory usage on Thor via a new nvidia-smi–based process table path.

Enhancements:

Introduce a cached VRAM probing helper that aggregates nvidia-smi compute-app memory usage and reuses results across UI frames to reduce overhead.
Add a runtime capability probe to select between nvidia-smi and nvmap per-process memory sources based on the active GPU driver stack.

On Thor (nvidia.ko driver), NvMapMemUsed is absent from /proc/meminfo so jtop showed Shared: 0k and no GPU memory chart. This change wires up nvidia-smi --query-compute-apps as the GPU memory source, which is the only reliable path on BSPs where device-level memory.used returns null. - thor_gpu.py: _nvsmi_gpu_used_mib() sums per-process GPU allocations with a 1-second cache; read_gpu_mem_rows_for_gui() uses this for VRAM - pgpu_thor.py: adds chart_vram (green) to the GPU page; falls back to shared RAM chart when nvidia-smi is unavailable - memory.py: populates shared from nvidia-smi when NvMapMemUsed is absent; sets shared_label='VRAM' to distinguish from Orin behaviour - pmem.py: uses shared_label so MEM tab shows 'VRAM' on Thor and 'Shared' on Orin/nvgpu-stack devices unchanged Tested on Thor T5000 (r38.x) with nemotron-3-nano:30b via Ollama: 2GPU tab: VRAM 25.7G/122G green bar 4MEM tab: VRAM: 25.7G green legend entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

On the nvidia.ko stack (Thor), nvmap iovmm/clients shows 0K for CUDA allocations because CUDA bypasses nvmap. Use nvidia-smi --query-compute-apps as the per-process GPU memory source on that stack, falling back to nvmap on the nvgpu stack (Orin) unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sourcery-ai · 2026-04-22T20:26:57Z

Reviewer's Guide

Implements GPU VRAM reporting and per-process GPU memory accounting for Thor (unified nvidia.ko) by sourcing usage from nvidia-smi compute-apps, wiring it into the Thor GPU page, MEM tab, and process table, while preserving existing Orin/nvgpu behavior.

Sequence diagram for GPU VRAM chart update on Thor

sequenceDiagram
    actor User
    participant PGPUThor
    participant ThorGPU as thor_gpu
    participant NvidiaSmi as nvidia_smi

    User->>PGPUThor: open_GPU_tab()
    loop every_UI_frame
        PGPUThor->>PGPUThor: update_chart_vram(jetson, name)
        PGPUThor->>ThorGPU: read_gpu_mem_rows_for_gui(device_index)
        alt cache_hit_in__gpu_vram_bytes
            ThorGPU-->>ThorGPU: _gpu_vram_bytes() returns cached (used_b, total_b)
        else cache_miss_in__gpu_vram_bytes
            ThorGPU->>ThorGPU: _gpu_vram_bytes()
            ThorGPU->>NvidiaSmi: _nvsmi_gpu_used_mib() via nvidia-smi --query-compute-apps
            alt nvidia_smi_success
                NvidiaSmi-->>ThorGPU: list of used_memory values (MiB)
                ThorGPU-->>ThorGPU: sum MiB -> used_b
                ThorGPU-->>ThorGPU: _read_memtotal_bytes() -> total_b
                ThorGPU-->>ThorGPU: store (used_b, total_b) in cache
            else nvidia_smi_failure
                NvidiaSmi-->>ThorGPU: error or empty
                ThorGPU-->>ThorGPU: cache result None
            end
        end
        ThorGPU-->>PGPUThor: {vram_used_b, vram_total_b, shared_used_b, shared_total_b}
        alt vram_total_b > 0
            PGPUThor->>PGPUThor: compute chart scaling with size_min()
            PGPUThor-->>User: render VRAM chart with label VRAM used/total
        else
            PGPUThor->>PGPUThor: fallback to shared RAM chart
            PGPUThor-->>User: render Shared chart
        end
    end

Sequence diagram for per-process GPU memory source selection

sequenceDiagram
    actor User
    participant ProcessService
    participant NvidiaSmi as nvidia_smi
    participant Nvmap as nvmap_kernel

    User->>ProcessService: start_service()
    activate ProcessService
    ProcessService->>ProcessService: __init__()
    ProcessService->>ProcessService: check os.path.isfile(.../debug/nvmap/iovmm/maps)
    ProcessService-->>ProcessService: set _isJetson flag
    ProcessService->>NvidiaSmi: _nvsmi_useful() via nvidia-smi --query-gpu=name
    alt nvidia_smi_reports_real_name
        NvidiaSmi-->>ProcessService: gpu_name without (nvgpu)
        ProcessService-->>ProcessService: _isNvidiaSmi = True
    else nvidia_smi_unusable
        NvidiaSmi-->>ProcessService: error or (nvgpu)
        ProcessService-->>ProcessService: _isNvidiaSmi = False
    end
    deactivate ProcessService

    loop when_GUI_requests_process_table
        User->>ProcessService: get_status()
        ProcessService->>ProcessService: read /proc/uptime
        alt _isNvidiaSmi is True
            ProcessService->>NvidiaSmi: read_nvsmi_compute_table() via nvidia-smi --query-compute-apps
            alt nvidia_smi_success
                NvidiaSmi-->>ProcessService: (total_kb, raw_rows)
                ProcessService->>ProcessService: get_process_info(pid, gpu_mem_kb, name, uptime) for each row
                ProcessService-->>User: per-process GPU MEM from nvidia-smi
            else nvidia_smi_failure
                NvidiaSmi-->>ProcessService: (0, [])
                ProcessService-->>User: empty GPU MEM table
            end
        else _isJetson is True
            ProcessService->>Nvmap: read_process_table(.../debug/nvmap/iovmm/maps)
            Nvmap-->>ProcessService: (total_kb, raw_rows)
            ProcessService->>ProcessService: get_process_info(pid, gpu_mem_kb, name, uptime) for each row
            ProcessService-->>User: per-process GPU MEM from nvmap
        else neither_path_available
            ProcessService-->>User: empty GPU MEM table
        end
    end

Updated class diagram for GPU memory collection and UI integration

classDiagram
    class ProcessService {
        - bool _isJetson
        - bool _isNvidiaSmi
        - float _clk_tck
        __init__()
        get_status()
        get_process_info(pid, gpu_mem_usage, process_name, uptime)
    }

    class processes_module {
        +_nvsmi_useful() bool
        +read_nvsmi_compute_table() tuple
        +read_process_table(path) tuple
    }

    class ThorGpuModule {
        +_nvsmi_gpu_used_mib() int
        +_gpu_vram_bytes() tuple
        +read_gpu_mem_rows_for_gui(device_index) dict
        +_read_memtotal_bytes() int
        +_read_memavailable_bytes() int
    }

    class MemoryService {
        +get_status(mem_total) dict
    }

    class PGPUThor {
        +update_chart(jetson, name) dict
        +update_chart_ram(jetson, name) dict
        +update_chart_vram(jetson, name) dict
        +draw(key, mouse)
    }

    class PMemUI {
        +draw_ram_legend(pos_y, pos_x)
    }

    ProcessService --> processes_module : uses
    ProcessService --> ThorGpuModule : imports _nvsmi_gpu_used_mib (indirect via memory)

    processes_module ..> ProcessService : provides helpers
    processes_module ..> ThorGpuModule : shared nvidia-smi semantics

    ThorGpuModule ..> PGPUThor : supplies VRAM and shared RAM rows

    MemoryService --> ThorGpuModule : calls _nvsmi_gpu_used_mib
    PMemUI --> MemoryService : reads shared and shared_label

    PGPUThor --> ThorGpuModule : calls read_gpu_mem_rows_for_gui
    PGPUThor --> PMemUI : consistent VRAM vs Shared labeling

File-Level Changes

Change	Details	Files
Add nvidia-smi based per-process GPU memory backend and selection logic in the process service.	Introduce _nvsmi_useful() probe to detect when nvidia-smi reports real GPU data (non-nvgpu stacks). Add read_nvsmi_compute_table() to parse nvidia-smi --query-compute-apps output into the existing (total_kb, rows) shape. Extend ProcessService.init to track both nvmap availability (_isJetson) and nvidia-smi usability (_isNvidiaSmi). Update ProcessService.get_status() to prefer nvidia-smi compute-apps data on Thor, fall back to nvmap on Orin, and keep the uptime-based metadata path unchanged.	`jtop/core/processes.py`
Expose Thor GPU VRAM usage via nvidia-smi compute-apps with caching and feed it into the GPU page RAM/VRAM charts.	Add _nvsmi_gpu_used_mib() to sum per-process GPU allocations from nvidia-smi --query-compute-apps=used_memory. Introduce a 1-second cached gpu_vram_bytes() helper that returns (used_bytes, total_bytes) using MemTotal as the unified DRAM pool. Extend read_gpu_mem_rows_for_gui() to populate vram_used_b/vram_total_b instead of hard-coded zeros while keeping shared* fields based on MemTotal/MemAvailable.	`jtop/core/thor_gpu.py` `jtop/gui/pgpu_thor.py`
Make MEM tab shared memory reflect GPU VRAM on Thor and keep shared semantics on Orin, with appropriate labeling.	Change get_status() to test for NvMapMemUsed presence and, when absent, fall back to _nvsmi_gpu_used_mib() to populate shared with GPU VRAM usage. Introduce shared_label in the RAM status payload so callers can distinguish VRAM from generic shared memory. Update pmem draw_ram_legend() to render shared_label (VRAM on Thor, Shared on Orin and others) in the MEM legend.	`jtop/core/memory.py` `jtop/gui/pmem.py`
Add a dedicated VRAM chart on the Thor GPU page and use it when VRAM totals are available, falling back to the existing shared RAM chart otherwise.	Instantiate a new Chart instance for GPU VRAM alongside the existing shared RAM chart in the Thor GPU page initialization. Implement update_chart_vram() to scale VRAM bytes into chart units using size_min, mirroring update_chart_ram()’s behavior. Update the Thor GPU draw() path to select VRAM chart/labels when vram_total_b > 0, otherwise keep showing the shared RAM chart and labels.	`jtop/gui/pgpu_thor.py`

Possibly linked issues

#GPU NOT DETECTED/AVAILABLE: The PR implements Thor-specific VRAM and per-process GPU memory reporting, directly addressing the missing GPU memory usage in jtop.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

…emUsed On Orin r38 kernels, NvMapMemUsed is absent from /proc/meminfo and nvidia-smi compute-apps returns [N/A], so the else-branch was leaving ram_shared=0 and the 4MEM tab showed Shared: 0k. Restore the pre-branch fallback: if neither source provides data, use the nvmap process total (mem_total) passed in from processes.get_status(), matching the original behavior on master. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ce crash nvpmodel -q takes ~2.5s when spawned via subprocess in a systemd cgroup, vs ~0.2s from the shell. The old 4s COMMAND_TIMEOUT was too tight and caused intermittent failures on Orin r38. Two fixes: 1. Raise COMMAND_TIMEOUT from 4s to 10s for headroom in the service context. 2. Catch JtopException in NVPModelService.__init__: the timeout was being re-wrapped as JtopException by nvpmodel_query(), escaping the existing (OSError, Command.CommandException) handler and crashing the whole service with exit code 0 instead of gracefully disabling nvpmodel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tokk-nv and others added 3 commits April 22, 2026 09:08

Auto style code with autopep8

6c5c6d8

dusty-nv and others added 2 commits April 22, 2026 19:09

johnnynunez self-requested a review April 24, 2026 14:45

whitesscott mentioned this pull request May 2, 2026

feat: Thor GPU process table and VRAM chart via pynvml #852

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(thor): GPU VRAM display and per-process #849

feat(thor): GPU VRAM display and per-process #849
tokk-nv wants to merge 5 commits intorbonghi:masterfrom
tokk-nv:feature/thor-r38-gpu-memory

tokk-nv commented Apr 22, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Apr 22, 2026

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tokk-nv commented Apr 22, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

jtop/core/thor_gpu.py

jtop/gui/pgpu_thor.py

jtop/core/memory.py

jtop/core/processes.py

jtop/gui/pmem.py

Tested on

Notes

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Apr 22, 2026

Reviewer's Guide

Sequence diagram for GPU VRAM chart update on Thor

Sequence diagram for per-process GPU memory source selection

Updated class diagram for GPU memory collection and UI integration

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tokk-nv commented Apr 22, 2026 •

edited by sourcery-ai Bot

Loading

`jtop/core/thor_gpu.py`

`jtop/gui/pgpu_thor.py`

`jtop/core/memory.py`

`jtop/core/processes.py`

`jtop/gui/pmem.py`