Skip to content

Memory leak for pyamgx solvers resulting in termination #1204

@andrewsb8

Description

@andrewsb8

I cannot execute a pyamgx solver in 4.0.2 due to a memory leak issue and think I have narrowed it down to FiPy rather than the dependencies. I can run the demo AMGX script after compilation:

$ examples/amgx_capi -m ../examples/matrix.mtx -c ../src/configs/FGMRES_AGGREGATION.json
AMGX version 2.5.0
Built on May 27 2026, 18:02:50
Compiled with CUDA Runtime 12.8, using CUDA driver 12.8
Warning: No mode specified, using dDDI by default.
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
Matrix A is scalar and has 12 rows
AMG Grid:
         Number of Levels: 1
            LVL         ROWS               NNZ  PARTS    SPRSTY       Mem (GB)
        ----------------------------------------------------------------------
           0(D)           12                61      1     0.424       8.75e-07
         ----------------------------------------------------------------------
         Grid Complexity: 1
         Operator Complexity: 1
         Total Memory Usage: 8.75443e-07 GB
         ----------------------------------------------------------------------
           iter      Mem Usage (GB)       residual           rate
         ----------------------------------------------------------------------
            Ini            0.587036   3.464102e+00
              0            0.587036   9.112230e-15         0.0000
         ----------------------------------------------------------------------
         Total Iterations: 1
         Avg Convergence Rate:                   0.0000
         Final Residual:                   9.112230e-15
         Total Reduction in Residual:      2.630474e-15
         Maximum Memory Usage:                    0.587 GB
         ----------------------------------------------------------------------
Total Time: 0.0369664
    setup: 0.0236718 s
    solve: 0.0132946 s
    solve(per iteration): 0.0132946 s

And i can run the pyamgx demo.py:

$ python demo.py
AMGX version 2.5.0
Built on May 27 2026, 18:02:50
Compiled with CUDA Runtime 12.8, using CUDA driver 12.8
The AMGX_initialize_plugins API call is deprecated and can be safely removed.
pyamgx solution:  [ 1.71984413 -0.13744417  1.2763017  -2.43141954 -0.59432371]
scipy solution:  [ 1.71984413 -0.13744417  1.2763017  -2.43141954 -0.59432371]
The AMGX_finalize_plugins API call is deprecated and can be safely removed.

Then, running

from fipy.solvers.pyamgx.linearGMRESSolver import LinearGMRESSolver
s = LinearGMRESSolver()

results in

AMGX version 2.5.0
Built on May 27 2026, 18:02:50
Compiled with CUDA Runtime 12.8, using CUDA driver 12.8
The AMGX_initialize_plugins API call is deprecated and can be safely removed.
The AMGX_finalize_plugins API call is deprecated and can be safely removed.
!!! detected some memory leaks in the code: trying to free non-empty temporary device pool !!!
ptr:     0x154bee000000 size: 4096
ptr:     0x154bee001000 size: 4096
terminate called after throwing an instance of 'amgx::amgx_exception'
  what():  Cuda failure: 'invalid argument'

Aborted (core dumped)

If I do something slightly more complicated, it gets worse:

from fipy import CellVariable, Grid1D, DiffusionTerm
from fipy.solvers.pyamgx.linearCGSolver import LinearCGSolver

mesh = Grid1D(nx=10, dx=1.)
phi = CellVariable(mesh=mesh, value=0.)
eq = DiffusionTerm(coeff=1.)
eq.solve(var=phi, solver=LinearCGSolver())
The AMGX_finalize_plugins API call is deprecated and can be safely removed.
!!! detected some memory leaks in the code: trying to free non-empty temporary device pool !!!
ptr:     0x14e42a002000 size: 4096
ptr:     0x14e42a003000 size: 4096
ptr:     0x14e42a005000 size: 4096
ptr:     0x14e42a000000 size: 4096
ptr:     0x14e42a006000 size: 4096
ptr:     0x14e42a007000 size: 4096
ptr:     0x14e42a008000 size: 4096
ptr:     0x14e42a004000 size: 4096
ptr:     0x14e42a001000 size: 4096
ptr:     0x14e42a009000 size: 4096
ptr:     0x14e42a00a000 size: 4096
ptr:     0x14e42a00b000 size: 4096
ptr:     0x14e42a00c000 size: 4096
terminate called after throwing an instance of 'amgx::amgx_exception'
  what():  Cuda failure: 'invalid argument'

Aborted (core dumped)

I have looked at the source but I haven't been able to figure out the underlying reason for this behavior.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions