feat(thor): GPU VRAM display and per-process #849
Draft
tokk-nv wants to merge 5 commits intorbonghi:masterfrom
Draft
feat(thor): GPU VRAM display and per-process #849tokk-nv wants to merge 5 commits intorbonghi:masterfrom
tokk-nv wants to merge 5 commits intorbonghi:masterfrom
Conversation
On Thor (nvidia.ko driver), NvMapMemUsed is absent from /proc/meminfo so jtop showed Shared: 0k and no GPU memory chart. This change wires up nvidia-smi --query-compute-apps as the GPU memory source, which is the only reliable path on BSPs where device-level memory.used returns null. - thor_gpu.py: _nvsmi_gpu_used_mib() sums per-process GPU allocations with a 1-second cache; read_gpu_mem_rows_for_gui() uses this for VRAM - pgpu_thor.py: adds chart_vram (green) to the GPU page; falls back to shared RAM chart when nvidia-smi is unavailable - memory.py: populates shared from nvidia-smi when NvMapMemUsed is absent; sets shared_label='VRAM' to distinguish from Orin behaviour - pmem.py: uses shared_label so MEM tab shows 'VRAM' on Thor and 'Shared' on Orin/nvgpu-stack devices unchanged Tested on Thor T5000 (r38.x) with nemotron-3-nano:30b via Ollama: 2GPU tab: VRAM 25.7G/122G green bar 4MEM tab: VRAM: 25.7G green legend entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On the nvidia.ko stack (Thor), nvmap iovmm/clients shows 0K for CUDA allocations because CUDA bypasses nvmap. Use nvidia-smi --query-compute-apps as the per-process GPU memory source on that stack, falling back to nvmap on the nvgpu stack (Orin) unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
Reviewer's GuideImplements GPU VRAM reporting and per-process GPU memory accounting for Thor (unified nvidia.ko) by sourcing usage from nvidia-smi compute-apps, wiring it into the Thor GPU page, MEM tab, and process table, while preserving existing Orin/nvgpu behavior. Sequence diagram for GPU VRAM chart update on ThorsequenceDiagram
actor User
participant PGPUThor
participant ThorGPU as thor_gpu
participant NvidiaSmi as nvidia_smi
User->>PGPUThor: open_GPU_tab()
loop every_UI_frame
PGPUThor->>PGPUThor: update_chart_vram(jetson, name)
PGPUThor->>ThorGPU: read_gpu_mem_rows_for_gui(device_index)
alt cache_hit_in__gpu_vram_bytes
ThorGPU-->>ThorGPU: _gpu_vram_bytes() returns cached (used_b, total_b)
else cache_miss_in__gpu_vram_bytes
ThorGPU->>ThorGPU: _gpu_vram_bytes()
ThorGPU->>NvidiaSmi: _nvsmi_gpu_used_mib() via nvidia-smi --query-compute-apps
alt nvidia_smi_success
NvidiaSmi-->>ThorGPU: list of used_memory values (MiB)
ThorGPU-->>ThorGPU: sum MiB -> used_b
ThorGPU-->>ThorGPU: _read_memtotal_bytes() -> total_b
ThorGPU-->>ThorGPU: store (used_b, total_b) in cache
else nvidia_smi_failure
NvidiaSmi-->>ThorGPU: error or empty
ThorGPU-->>ThorGPU: cache result None
end
end
ThorGPU-->>PGPUThor: {vram_used_b, vram_total_b, shared_used_b, shared_total_b}
alt vram_total_b > 0
PGPUThor->>PGPUThor: compute chart scaling with size_min()
PGPUThor-->>User: render VRAM chart with label VRAM used/total
else
PGPUThor->>PGPUThor: fallback to shared RAM chart
PGPUThor-->>User: render Shared chart
end
end
Sequence diagram for per-process GPU memory source selectionsequenceDiagram
actor User
participant ProcessService
participant NvidiaSmi as nvidia_smi
participant Nvmap as nvmap_kernel
User->>ProcessService: start_service()
activate ProcessService
ProcessService->>ProcessService: __init__()
ProcessService->>ProcessService: check os.path.isfile(.../debug/nvmap/iovmm/maps)
ProcessService-->>ProcessService: set _isJetson flag
ProcessService->>NvidiaSmi: _nvsmi_useful() via nvidia-smi --query-gpu=name
alt nvidia_smi_reports_real_name
NvidiaSmi-->>ProcessService: gpu_name without (nvgpu)
ProcessService-->>ProcessService: _isNvidiaSmi = True
else nvidia_smi_unusable
NvidiaSmi-->>ProcessService: error or (nvgpu)
ProcessService-->>ProcessService: _isNvidiaSmi = False
end
deactivate ProcessService
loop when_GUI_requests_process_table
User->>ProcessService: get_status()
ProcessService->>ProcessService: read /proc/uptime
alt _isNvidiaSmi is True
ProcessService->>NvidiaSmi: read_nvsmi_compute_table() via nvidia-smi --query-compute-apps
alt nvidia_smi_success
NvidiaSmi-->>ProcessService: (total_kb, raw_rows)
ProcessService->>ProcessService: get_process_info(pid, gpu_mem_kb, name, uptime) for each row
ProcessService-->>User: per-process GPU MEM from nvidia-smi
else nvidia_smi_failure
NvidiaSmi-->>ProcessService: (0, [])
ProcessService-->>User: empty GPU MEM table
end
else _isJetson is True
ProcessService->>Nvmap: read_process_table(.../debug/nvmap/iovmm/maps)
Nvmap-->>ProcessService: (total_kb, raw_rows)
ProcessService->>ProcessService: get_process_info(pid, gpu_mem_kb, name, uptime) for each row
ProcessService-->>User: per-process GPU MEM from nvmap
else neither_path_available
ProcessService-->>User: empty GPU MEM table
end
end
Updated class diagram for GPU memory collection and UI integrationclassDiagram
class ProcessService {
- bool _isJetson
- bool _isNvidiaSmi
- float _clk_tck
__init__()
get_status()
get_process_info(pid, gpu_mem_usage, process_name, uptime)
}
class processes_module {
+_nvsmi_useful() bool
+read_nvsmi_compute_table() tuple
+read_process_table(path) tuple
}
class ThorGpuModule {
+_nvsmi_gpu_used_mib() int
+_gpu_vram_bytes() tuple
+read_gpu_mem_rows_for_gui(device_index) dict
+_read_memtotal_bytes() int
+_read_memavailable_bytes() int
}
class MemoryService {
+get_status(mem_total) dict
}
class PGPUThor {
+update_chart(jetson, name) dict
+update_chart_ram(jetson, name) dict
+update_chart_vram(jetson, name) dict
+draw(key, mouse)
}
class PMemUI {
+draw_ram_legend(pos_y, pos_x)
}
ProcessService --> processes_module : uses
ProcessService --> ThorGpuModule : imports _nvsmi_gpu_used_mib (indirect via memory)
processes_module ..> ProcessService : provides helpers
processes_module ..> ThorGpuModule : shared nvidia-smi semantics
ThorGpuModule ..> PGPUThor : supplies VRAM and shared RAM rows
MemoryService --> ThorGpuModule : calls _nvsmi_gpu_used_mib
PMemUI --> MemoryService : reads shared and shared_label
PGPUThor --> ThorGpuModule : calls read_gpu_mem_rows_for_gui
PGPUThor --> PMemUI : consistent VRAM vs Shared labeling
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
…emUsed On Orin r38 kernels, NvMapMemUsed is absent from /proc/meminfo and nvidia-smi compute-apps returns [N/A], so the else-branch was leaving ram_shared=0 and the 4MEM tab showed Shared: 0k. Restore the pre-branch fallback: if neither source provides data, use the nvmap process total (mem_total) passed in from processes.get_status(), matching the original behavior on master. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ce crash nvpmodel -q takes ~2.5s when spawned via subprocess in a systemd cgroup, vs ~0.2s from the shell. The old 4s COMMAND_TIMEOUT was too tight and caused intermittent failures on Orin r38. Two fixes: 1. Raise COMMAND_TIMEOUT from 4s to 10s for headroom in the service context. 2. Catch JtopException in NVPModelService.__init__: the timeout was being re-wrapped as JtopException by nvpmodel_query(), escaping the existing (OSError, Command.CommandException) handler and crashing the whole service with exit code 0 instead of gracefully disabling nvpmodel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On Jetson Thor running L4T r38.x (unified
nvidia.kodriver), jtop had novisibility into GPU memory:
Shared: 0kalways, even under heavy GPU loadGPU MEM: 0kfor every processRoot causes:
NvMapMemUsedis absent from/proc/meminfoon thenvidia.kostack —it only exists on the
nvgpustack (Orin family).nvidia-smi --query-gpu=memory.usedreturns[N/A]on some Thor BSPs./sys/kernel/debug/nvmap/iovmm/clientseither does not exist on Thor orshows
0Kfor every row because CUDA allocations bypass nvmap on thisdriver stack.
The only reliable per-process GPU memory source on Thor is
nvidia-smi --query-compute-apps, which enumerates CUDA processes with realallocation sizes. This is the same probe used by
tegrastatstooling andverified against a running Ollama instance (gemma3:4b → 3.8 GiB,
nemotron-3-nano:30b → 25.7 GiB).
Changes
jtop/core/thor_gpu.py_nvsmi_gpu_used_mib()— sumsnvidia-smi --query-compute-apps=used_memorywith a 1-second module-level cache to avoid spawning nvidia-smi on every
UI frame.
_gpu_vram_bytes()— wraps the above with caching, returns(used_bytes, total_bytes)where total isMemTotalfrom/proc/meminfo(Thor uses unified DRAM so this equals the full GPU-accessible pool).
read_gpu_mem_rows_for_gui()— now uses_gpu_vram_bytes()instead ofhard-coded zeros.
jtop/gui/pgpu_thor.pychart_vram(single green series) to the GPU page.update_chart_vram()callback feeds the chart fromread_gpu_mem_rows_for_gui().draw()renders the VRAM chart on the right half whenvram_total_b > 0,falling back to the shared RAM chart when nvidia-smi is unavailable.
jtop/core/memory.pyget_status()now checks for the presence ofNvMapMemUsedin/proc/meminforather than relying on it defaulting to zero.nvidia.ko), calls_nvsmi_gpu_used_mib()to populatesharedwith GPU process allocations and setsshared_label = 'VRAM'.nvgpubehaviour is unchanged.jtop/core/processes.py_nvsmi_useful()capability probe (same logic asjetson-diagnosticscripts): returns True when
nvidia-smi --query-gpu=namereturns a namethat does not contain
(nvgpu).read_nvsmi_compute_table()— readsnvidia-smi --query-compute-apps=pid,process_name,used_memoryand returnsthe same
(total_kb, rows)shape asread_process_table().ProcessService.get_status()prefers the nvidia-smi path when_isNvidiaSmiis True, falls back to nvmap when only_isJetsonis True,and returns an empty table otherwise. Orin behaviour is unchanged.
jtop/gui/pmem.pydraw_ram_legend()readsshared_labelfromjetson.memory['RAM']sothe MEM tab shows
VRAMon Thor andSharedon Orin/nvgpu-stackdevices as before.
Tested on
VRAM X/122Gchart updates live as models load/unload.VRAM: Xlegend entry (green) replacesShared: 0k.GPU MEMcolumn populated correctly per process.NvMapMemUsedpath unchanged,Sharedlabel unchanged, nvmap process table unchanged.Notes
_isNvidiaSmiis evaluated once atProcessServiceinit. If the GPUdriver is not ready at jtop service start, a service restart will re-probe.
thor_gpu.pyis process-scoped. The jtopservice (root) and GUI client (user) each maintain their own cache; both
call nvidia-smi independently at most once per second.
MemTotalfrom/proc/meminfo. On Thor the GPU and CPUshare the same physical DRAM pool, so this is the correct denominator.
The bar therefore shows what fraction of total DRAM is held by GPU compute
processes.
Summary by Sourcery
Add Thor-specific GPU VRAM and per‑process memory reporting using nvidia-smi while preserving existing Orin/nvgpu behaviour.
New Features:
Enhancements: