OpenFOAM on GPUs in 2025: state of play (what works, what doesn’t)

Short and honest: there isn’t a one‑click, end‑to‑end GPU OpenFOAM yet. You can still get real wins by moving the linear solver onto the GPU with mature libraries. This page shows you what’s stable, what’s experimental, and how to try it on GPU computing services without sinking time.

Snapshot

Works today (production‑ish)

PETSc4FOAM with a GPU backend (CUDA/HIP/SYCL via PETSc). Swaps OpenFOAM’s built‑ins for PETSc solvers and preconditioners.
AmgX via amgx4Foam: offloads pressure/Poisson‑like solves to NVIDIA GPUs.
Ginkgo (via wrappers such as OGL): portable sparse linear algebra on NVIDIA/AMD/Intel GPUs.

Active/experimental

C++ parallelism / OpenMP target offload proofs for selected apps (e.g., laplacianFoam). Promising, not general.

Reality check

Gains are best when linear algebra dominates the runtime.
There’s overhead: converting OpenFOAM’s LDU matrices to CSR/ELL and host–device transfers.
FP64 matters for accuracy; consumer GPUs have weak FP64. Pick hardware to match your tolerance (see the FP64 checklist).

Start in seconds with the fastest, most affordable cloud GPU clusters.

Launch an instance in under a minute. Enjoy flexible pricing, powerful hardware, and 24/7 support. Scale as you grow—no long-term commitment needed.

Try Compute now

Try it on real GPUs(two practical paths)

Path A · PETSc4FOAM (portable, vendor‑neutral)

Template: pick a CUDA‑ready image (e.g., Ubuntu 24.04 LTS / CUDA 12.6).
Install: build PETSc with your GPU backend, then build petsc4Foam (OpenFOAM external‑solver module).
Select in your case: switch the linear solver in fvSolution to PETSc and choose a GPU‑capable preconditioner.

Sketch

# inside the running container nvidia-smi # Build PETSc (double precision, release, CUDA as example) ./configure \ --with-cuda=1 --with-cudac=nvcc \ --with-precision=double --with-debugging=0 \ --download-hypre make all # Build the OpenFOAM external-solver (petsc4Foam) # (follow your OpenFOAM distribution’s module build steps)

system/fvSolution** (pattern)**

solvers { p { // Keep your tolerances tolerance 1e-7; relTol 0.01; // Load PETSc external solver externalSolverLibs ("libpetscFoam.so"); externalSolver PETSc; // PETSc options (example — tune for your case) // e.g., CG + AMG preconditioner with GPU backend // petscOptions "-ksp_type cg -pc_type hypre -pc_hypre_type boomeramg"; } }

Exact names/paths differ by OpenFOAM distro and module version. Keep the idea: load the external solver library, select PETSc, and pass PETSc options that use your GPU backend.

Path B · AmgX via amgx4Foam (NVIDIA‑focused)

Build AmgX and amgx4Foam in your image.
Point your case to AmgX and supply an AmgX JSON config.

fvSolution** (pattern)**

solvers { p { tolerance 1e-7; relTol 0.01; externalSolverLibs ("libamgx4Foam.so"); externalSolver AmgX; amgxConfig "system/amgx.json"; } }

system/amgx.json** (minimal idea)**

{ "config_version": 2, "determinism_flag": 1, "solver": { "preconditioner": { "algorithm": "AMG", "max_iters": 2 }, "solver": "PCG", "max_iters": 100, "convergence": "RELATIVE_RESIDUAL", "tolerance": 1e-7 } }

Start conservative; then tune (cycles, smoother, coarsening) on a small mesh.

When GPUs help (and when they don’t)

Good candidates

Pressure‑based incompressible flows where pressure Poisson dominates.
Large steady/transient cases where linear solves are ≥60–70% of time.
Meshes that fit comfortably in GPU VRAM with room for buffers.

Poor candidates

Small meshes, heavy I/O/post, or models where assembly dominates.
Physics/algorithms not mapped to the GPU backend you chose.

Performance & precision notes

Matrix conversions (LDU → CSR/ELL) cost time and RAM. Amortize with longer runs or larger solves.
Preconditioner choice is everything. AMG often wins; ILU‑like on GPU can be tricky.
Precision: most backends support FP64; it’s slower on consumer GPUs. Validate error bands before committing.
Multi‑GPU: possible with PETSc/Ginkgo backends. Keep partitions balanced and prefer fast interconnects.

Minimal self‑benchmark (keep it boring)

case: solver, mesh (cells), physics, time step/iterations backend: PETSc|AmgX|Ginkgo + options metrics: wall time, solver time %, iterations/step, residual history, peak VRAM hardware: GPU model/VRAM, driver, CUDA; CPU model/threads

Cost per converged case

cost_per_case = price_per_hour × wall_hours

Log the exact PETSc/AmgX options and the library versions in your Methods.

Troubleshooting

GPU idle / no speedup
Linear solve isn’t dominant, or the preconditioner is a poor fit. Profile where time goes and tune the backend.

OOM (VRAM)
Reduce mesh or switch to a larger‑VRAM profile. Check workspace settings in your backend.

“Unknown external solver / missing library”
Library not found. Confirm externalSolverLibs path and that the module was built for your OpenFOAM version.

Unstable/slow convergence
Try different AMG parameters or switch KSP/PC types. Validate vs a CPU baseline.

Methods snippet (copy‑paste)

hardware: gpu: "<model> (<VRAM> GB)" driver: "<NVIDIA/AMD/Intel driver>" cuda_hip_sycl: "<version>" software: openfoam: "<distro + version>" backend: "PETSc|AmgX|Ginkgo (<version>)" case: mesh_cells: <...> solver: "<simpleFoam | pisoFoam | ...>" run: fvSolution: externalSolverLibs: ["libpetscFoam.so"] externalSolver: "PETSc" options: "-ksp_type cg -pc_type hypre -pc_hypre_type boomeramg" outputs: wall_hours: "<hh:mm>" solver_share: "<% time in linear solve>" iters_per_step: "<…>" notes: "matrix format, precision, any deviations"

Try Compute today

Start a GPU instance with a CUDA-ready template (e.g., Ubuntu 24.04 LTS / CUDA 12.6) or your own GROMACS image. Enjoy flexible per-second billing with custom templates and the ability to start, stop, and resume your sessions at any time. Unsure about FP64 requirements? Contact support to help you select the ideal hardware profile for your computational needs.

‍

← Back