← Back

OpenFOAM on GPUs in 2025: state of play (what works, what doesn’t)

Short and honest: there isn’t a one‑click, end‑to‑end GPU OpenFOAM yet. You can still get real wins by moving the linear solver onto the GPU with mature libraries. This page shows you what’s stable, what’s experimental, and how to try it on GPU computing services without sinking time.

Snapshot

Works today (production‑ish)

  • PETSc4FOAM with a GPU backend (CUDA/HIP/SYCL via PETSc). Swaps OpenFOAM’s built‑ins for PETSc solvers and preconditioners.
  • AmgX via amgx4Foam: offloads pressure/Poisson‑like solves to NVIDIA GPUs.
  • Ginkgo (via wrappers such as OGL): portable sparse linear algebra on NVIDIA/AMD/Intel GPUs.

Active/experimental

  • C++ parallelism / OpenMP target offload proofs for selected apps (e.g., laplacianFoam). Promising, not general.

Reality check

  • Gains are best when linear algebra dominates the runtime.
  • There’s overhead: converting OpenFOAM’s LDU matrices to CSR/ELL and host–device transfers.
  • FP64 matters for accuracy; consumer GPUs have weak FP64. Pick hardware to match your tolerance (see the FP64 checklist).

Start in seconds with the fastest, most affordable cloud GPU clusters.

Launch an instance in under a minute. Enjoy flexible pricing, powerful hardware, and 24/7 support. Scale as you grow—no long-term commitment needed.

Try Compute now

Try it on real GPUs(two practical paths)

Path A · PETSc4FOAM (portable, vendor‑neutral)

  1. Template: pick a CUDA‑ready image (e.g., Ubuntu 24.04 LTS / CUDA 12.6).
  2. Install: build PETSc with your GPU backend, then build petsc4Foam (OpenFOAM external‑solver module).
  3. Select in your case: switch the linear solver in fvSolution to PETSc and choose a GPU‑capable preconditioner.

Sketch

# inside the running container
nvidia-smi
# Build PETSc (double precision, release, CUDA as example)
./configure \
 --with-cuda=1 --with-cudac=nvcc \
 --with-precision=double --with-debugging=0 \
 --download-hypre
make all
# Build the OpenFOAM external-solver (petsc4Foam)
# (follow your OpenFOAM distribution’s module build steps)

system/fvSolution** (pattern)**

solvers
{
   p
   {
       // Keep your tolerances
       tolerance   1e-7;
       relTol      0.01;

       // Load PETSc external solver
       externalSolverLibs ("libpetscFoam.so");
       externalSolver     PETSc;

       // PETSc options (example — tune for your case)
       // e.g., CG + AMG preconditioner with GPU backend
       // petscOptions "-ksp_type cg -pc_type hypre -pc_hypre_type boomeramg";
   }
}

Exact names/paths differ by OpenFOAM distro and module version. Keep the idea: load the external solver library, select PETSc, and pass PETSc options that use your GPU backend.

Path B · AmgX via amgx4Foam (NVIDIA‑focused)

  1. Build AmgX and amgx4Foam in your image.
  2. Point your case to AmgX and supply an AmgX JSON config.

fvSolution** (pattern)**

solvers
{
   p
   {
       tolerance   1e-7;
       relTol      0.01;
       externalSolverLibs ("libamgx4Foam.so");
       externalSolver     AmgX;
       amgxConfig         "system/amgx.json";
   }
}

system/amgx.json** (minimal idea)**

{
 "config_version": 2,
 "determinism_flag": 1,
 "solver": {
   "preconditioner": { "algorithm": "AMG", "max_iters": 2 },
   "solver": "PCG", "max_iters": 100, "convergence": "RELATIVE_RESIDUAL", "tolerance": 1e-7
 }
}

Start conservative; then tune (cycles, smoother, coarsening) on a small mesh.

When GPUs help (and when they don’t)

Good candidates

  • Pressure‑based incompressible flows where pressure Poisson dominates.
  • Large steady/transient cases where linear solves are ≥60–70% of time.
  • Meshes that fit comfortably in GPU VRAM with room for buffers.

Poor candidates

  • Small meshes, heavy I/O/post, or models where assembly dominates.
  • Physics/algorithms not mapped to the GPU backend you chose.

Performance & precision notes

  • Matrix conversions (LDU → CSR/ELL) cost time and RAM. Amortize with longer runs or larger solves.
  • Preconditioner choice is everything. AMG often wins; ILU‑like on GPU can be tricky.
  • Precision: most backends support FP64; it’s slower on consumer GPUs. Validate error bands before committing.
  • Multi‑GPU: possible with PETSc/Ginkgo backends. Keep partitions balanced and prefer fast interconnects.

Minimal self‑benchmark (keep it boring)

case: solver, mesh (cells), physics, time step/iterations
backend: PETSc|AmgX|Ginkgo + options
metrics: wall time, solver time %, iterations/step, residual history, peak VRAM
hardware: GPU model/VRAM, driver, CUDA; CPU model/threads

Cost per converged case

cost_per_case = price_per_hour × wall_hours

Log the exact PETSc/AmgX options and the library versions in your Methods.

Troubleshooting

GPU idle / no speedup
Linear solve isn’t dominant, or the preconditioner is a poor fit. Profile where time goes and tune the backend.

OOM (VRAM)
Reduce mesh or switch to a larger‑VRAM profile. Check workspace settings in your backend.

“Unknown external solver / missing library”
Library not found. Confirm externalSolverLibs path and that the module was built for your OpenFOAM version.

Unstable/slow convergence
Try different AMG parameters or switch KSP/PC types. Validate vs a CPU baseline.

Methods snippet (copy‑paste)

hardware:
 gpu: "<model> (<VRAM> GB)"
 driver: "<NVIDIA/AMD/Intel driver>"
 cuda_hip_sycl: "<version>"
software:
 openfoam: "<distro + version>"
 backend: "PETSc|AmgX|Ginkgo (<version>)"
case:
 mesh_cells: <...>
 solver: "<simpleFoam | pisoFoam | ...>"
run:
 fvSolution:
   externalSolverLibs: ["libpetscFoam.so"]
   externalSolver: "PETSc"
   options: "-ksp_type cg -pc_type hypre -pc_hypre_type boomeramg"
outputs:
 wall_hours: "<hh:mm>"
 solver_share: "<% time in linear solve>"
 iters_per_step: "<…>"
 notes: "matrix format, precision, any deviations"

Related reading

Scientific modeling on cloud GPUs — what works, what doesn’t 

Try Compute today

Start a GPU instance with a CUDA-ready template (e.g., Ubuntu 24.04 LTS / CUDA 12.6) or your own GROMACS image. Enjoy flexible per-second billing with custom templates and the ability to start, stop, and resume your sessions at any time. Unsure about FP64 requirements? Contact support to help you select the ideal hardware profile for your computational needs.

← Back