Updates in 2025.4
General
Added support for profiling CUDA tile workloads.
Introduced a new Tile section to summarize tile dimensions and pipeline utilization, displayed when enabled and a tile workload is profiled.
Source page supports correlation between SASS and high-level Tile code (limited to cuTile Python code).
Added a new
ncu-repzfile format for zstd compressed report files.Added support for locking GPUs to boost clock instead of base on Ampere and newer GPU. Use the
boostandforce-boostoptions on supported drivers.Warp sampling by default now focuses on the Not Issued (
(_not_issued)) variants of the metrics. This is to avoid pointing to source locations where warp stalls are mitigated by having sufficient numbers of warps during an issue cycle to hide latency.Added support for node-level profiling of CUDA conditional graphs, including device-updatable nodes and nodes that can set conditional graph handles.
Added support for node-level profiling of CUDA graphs launched from the device (DGL), including host graph nodes that can launch DGL.
Source page now displays symbol labels: A new column for symbol labels has been added, and symbol labels are shown alongside addresses in SASS instruction disassembly. This change aligns the output with that of the nvdisasm tool.
Added support for collecting Warp sampling metrics with PM sampling allowing user to see function-level warp stalls for the selected time range in the timeline. See the Function Stats tool window for details.