You want a clean run, not a rabbit hole. This guide gives you a copy‑paste path to run a medium MD system on a single RTX 4090, plus a small kit to generate your own numbers you can trust.
What we’ll run
- A solvated protein system around 120k atoms (you can swap in your own).
- A CUDA‑enabled GROMACS image. GPU runs use mixed precision by design. Double precision builds do not use GPU acceleration. GROMACS typically uses single precision or mixed precision for optimal GPU performance, and single precision is the default for most GPU-accelerated runs.
Inputs
- system.tpr (or generate from your own .mdp, conf.gro, topol.top).
Prepare your template for NVIDIA GPUs (UI steps)
In most computing services, once the template boots, you’ll land inside the container with GPU access and gmx available. The container image is already configured with CUDA and GROMACS for immediate use.
Using pre‑made templates (Ubuntu 24.04 LTS / PyTorch 2.5)
Compute instances are compatible with CUDA 12.6 user‑space (and JupyterLab). That’s enough to run CUDA apps; the host driver is supplied usually supplied by the computing provider. For GROMACS you still want a CUDA build:
- Fastest: use the GROMACS image approach above and save it as your own template.
- If you pick Ubuntu v24.04 LTS (CUDA 12.6): install Apptainer and run the official GROMACS image with –nv:
sudo apt-get update && sudo apt-get install -y apptainer
apptainer exec --nv docker://gromacs/gromacs:2024.1 gmx --version
apptainer exec --nv -B $PWD:/work docker://gromacs/gromacs:2024.1 \
bash -lc "cd /work && gmx mdrun -deffnm md -nb gpu -pme gpu -update gpu -pin on"
- If you pick PyTorch 2.5 (CUDA 12.6): same Apptainer method works. PyTorch libs don’t conflict with GROMACS in a separate container.
Detailed installation instructions for GROMACS and Apptainer can be found in the official GROMACS documentation.
Tip: The CUDA version shown on the card is the toolkit/runtime in the image, not the driver inside your container. Check nvidia-smi to confirm GPU visibility.
Run inside the container
You’re already in the GROMACS container. GROMACS operations are executed using specific commands, such as gmx mdrun, which allow you to tailor workflows for different hardware setups. Create a project folder and run:
mkdir -p /work && cd /work
# If you have raw inputs, preprocess to a TPR (example):
# gmx grompp -f md.mdp -c conf.gro -p topol.top -o system.tpr
# Run with explicit GPU offload flags
gmx mdrun -deffnm md \
-nb gpu -pme gpu -update gpu -pin on
The mdrun program reads the TPR file, which contains the molecular topology and simulation parameters, and generates output files for logs, trajectories, and energies.
Check usage
nvidia-smi # utilization and memory
Running on your own machine? Use Docker:
docker pull gromacs/gromacs:2024.1
docker run --gpus all -it --rm -v $PWD:/work -w /work gromacs/gromacs:2024.1 bash
# then run the same gmx mdrun command inside the container
Option B · Apptainer/Singularity
If your policy requires it:
# Example: convert Docker image
apptainer build gromacs.sif docker://gromacs/gromacs:2024.1
apptainer exec --nv gromacs.sif gmx --version
apptainer exec --nv -B $PWD:/work gromacs.sif bash -lc \
"cd /work && gmx mdrun -deffnm md -nb gpu -pme gpu -update gpu -pin on"
Run‑your‑own benchmark kit
Protocol
- System: ~120k atoms, PME, 2 fs, LINCS, 0.9 nm cutoffs.
- Warmup: 5k steps. Measure: 50k steps.
- Record: GPU, driver, CUDA, container digest, GROMACS version, CPU model/threads, exact flags, .mdp.
These benchmarks allow you to evaluate and compare the performance of different GPU and CPU configurations in GROMACS simulations.
Table (fill with your numbers)
Read your results
- If PME dominates, try a few more CPU threads or enable PME split across GPUs on suitable builds.
- Watch VRAM: 24 GB fits many single‑system MD jobs of this size.
Cost‑per‑result, not just speed
Scientists care about cost per ns/day more than peak FPS.
cost_per_ns_day = (hourly_price × wall_hours) / (ns_per_day × (wall_hours/24))
Lower is better. If cost per result is poor, change the instance or revisit flags.
Validate your results and output files
- Baseline check. Run a short CPU, double‑precision test and compare against the GPU mixed‑precision run. Testing your setup with short runs and comparing results is essential to ensure correctness and performance. Look at energy drift, RMSD, temperature/pressure stability, or your task‑specific metric.
- Determinism. Set a fixed seed where relevant and re‑run a short window. Small stochastic variance is fine; big swings mean configuration issues.
- Offload sanity. Confirm the log prints GPU kernels for nonbonded/PME and that no major step has fallen back to CPU.
- Step size and constraints. Use a stable time step (e.g., 2 fs with LINCS). Check for LINCS warnings or constraint failures.
- Methods block. Fill the Methods snippet (hardware, CUDA/driver, container digest, solver version, flags) and keep it with the results.
Good defaults and small tweaks
- The default recommendation is to start with 2–6 CPU threads per GPU for most GROMACS runs and profile.
- Keep PME on the GPU for single‑GPU runs.
- Use local NVMe for scratch; write logs less often to avoid I/O stalls.
- Pin CUDA/driver and image digest. Don’t mix.
Troubleshooting
No GPU offload
Use a CUDA image and check the log. Confirm nvidia-container-toolkit is active.
Run slows over time
Check thermals and clocks with nvidia-smi. Keep -update gpu and -pin on.
Out of memory
Shrink the system, reduce neighbor list size, trim outputs, or move to a larger‑VRAM profile.
License servers for downstream tools
If you move to commercial solvers next, connect over VPN/SSH and fix FlexNet ports.
Methods snippet (fill and paste)
hardware:
gpu: "RTX 4090 (24 GB)"
driver: "<driver>"
cuda: "12.x"
cpu: "<model / threads used>"
software:
container: "docker://gromacs/gromacs:2024.1@sha256:<digest>"
solver: "gromacs 2024.1 (CUDA build)"
inputs:
tpr: "system.tpr"
run:
cmd: "gmx mdrun -deffnm md -nb gpu -pme gpu -update gpu -pin on"
outputs:
performance: "<ns/day>"
wallclock: "<HH:MM:SS>"
Related reading
- Scientific modeling on cloud GPUs — what works, what doesn’t
- Use your Ansys/COMSOL/Abaqus licenses on cloud instances → (coming soon)
Try Compute today
Start a GPU instance with a CUDA-ready template (e.g., Ubuntu 24.04 LTS / CUDA 12.6) or your own GROMACS image. Enjoy flexible per-second billing with custom templates and the ability to start, stop, and resume your sessions at any time. Unsure about FP64 requirements? Contact support to help you select the ideal hardware profile for your computational needs.