Docking is throughput work. GPUs make sense when you prepare clean inputs and run many replicas in parallel. This guide shows how to run GPU docking, batch thousands of ligands, and track cost per 10k ligands.
What we’ll cover
- Picking a CUDA‑ready template on your GPU renter and bringing the docking binaries
- Minimal receptor/ligand prep that doesn’t fall apart mid‑run
- A batch launcher that fills the GPU without babysitting
- A self‑benchmark harness and cost math you can copy
- VRAM, file I/O, and common error traps
1) Choose your image
On most GPU renters your job runs inside an image. Two paths that work well:
A) Use a CUDA template + mount docking tools (simple)
- Template: Ubuntu 24.04 LTS (CUDA 12.6)
- Install or mount AutoDock‑GPU / Vina‑compatible GPU binaries into the container at runtime (keep them outside the image).
B) Bring your own image (fastest to reuse)
- Build a private image that contains your docking tools + Python stack for prep.
- Add env vars:
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utility
- Entry script prints versions, creates
/work
, and waits (or starts the batch).
You do not need Docker‑in‑Docker in general as your provider passes the host driver into your container.
2) Minimal input prep that’s reproducible
Directory layout
/work
├── receptor/
│ ├── protein.pdbqt
│ └── grid/ # maps & config
├── ligands/
│ ├── L000001.pdbqt
│ ├── L000002.pdbqt
│ └── ...
└── jobs/
└── (outputs here)
Receptor
- Prepare once with your standard pipeline. Save PDBQT and grid box/config files used by your docking tool.
- Keep a README with center/size and any protonation choices.
Ligands
- Convert to PDBQT in a separate prep step (SMILES/SDF → PDBQT). Keep a deterministic script that sets protonation, tautomers, and charges the same way every time.
- Validate by docking a tiny panel first (10–20 ligands) and reviewing poses/scores.
3) Batch launcher (single GPU)
This skeleton runs ligands in small chunks, keeps the GPU fed, and writes logs you can parse later. Replace the DOCK_CMD
with your tool’s exact call (AutoDock‑GPU, Vina‑GPU, Uni‑Dock, etc.).
#!/usr/bin/env bash
set -euo pipefail
LIG_DIR=${1:-ligands}
OUT_DIR=${2:-jobs}
N_PAR=${3:-8} # concurrent processes per GPU (tune)
mkdir -p "$OUT_DIR"
# Edit this to your docking command. Example placeholders:
# autodock_gpu --lrigid receptor/protein.pdbqt --ffile receptor/grid/maps.fld \
# --lfile <ligand.pdbqt> --center_x ... --center_y ... --center_z ... --size_x ... --size_y ... --size_z ... \
# --exhaustiveness 8 --out <out.pdbqt> --log <out.log>
DOCK_CMD() { :; }
export -f DOCK_CMD
ls "$LIG_DIR"/*.pdbqt | awk '{print NR, $0}' | while read idx lig; do
base=$(basename "$lig" .pdbqt)
out="$OUT_DIR/$base"
mkdir -p "$out"
(
DOCK_CMD "$lig" "$out" >"$out/run.log" 2>&1 || echo "FAIL $base" >> "$OUT_DIR/failures.txt"
) &
# simple throttle
if (( $(jobs -rp | wc -l) >= N_PAR )); then wait -n; fi
done
wait
echo "Done. Outputs in $OUT_DIR/"
Tuning N_PAR
- Start with 4–8 processes per GPU, then raise until utilization hovers around ~90% without thrashing VRAM.
- Watch
nvidia-smi
for memory and utilization.
4) Multi‑GPU scale‑out
If your instance has N GPUs, launch one batch per GPU and pin each process group:
# GPU 0
CUDA_VISIBLE_DEVICES=0 bash dock_batch.sh ligands jobs/gpu0 8 &
# GPU 1
CUDA_VISIBLE_DEVICES=1 bash dock_batch.sh ligands jobs/gpu1 8 &
wait
Keep outputs separated by GPU for easier auditing.
5) Self‑benchmark & cost math
Record only what matters:
inputs: N_ligands, receptor, grid (center/size), exhaustiveness/seed
hardware: GPU model/VRAM, CUDA, driver
code: docking tool version + flags
metrics: ligands/hour, failures, wall_hours
Cost per 10k ligands
cost_per_10k = price_per_hour × wall_hours × (10_000 / N_ligands)
Report ligands/hour and cost per 10k per GPU. Keep the full command line in your Methods.
6) Practical knobs that move throughput
- Exhaustiveness / search steps: biggest lever. Find the lowest setting that passes your early‑enrichment check.
- Batch size (N_PAR): raise until VRAM or context switches stall you. Small molecules → higher N_PAR.
- I/O: write only what you need. Avoid chatty per‑pose logs when screening millions.
- Seeds: fix seeds for comparability across runs; randomize for the final screen if you prefer.
7) Troubleshooting
GPU OOM
Lower N_PAR, reduce exhaustiveness, or move to a larger‑VRAM GPU.
All jobs fail fast
Bad grid or incompatible flags. Re‑run a single ligand with verbose logging and compare to a known‑good CPU run.
Weird scores or poses
Input prep differences (protonation/tautomers/charges). Normalize your prep and test on a small curated set.
GPU idle
N_PAR too low or tool has long CPU pre/post stages. Parallelize prep or pre‑stage grids.
Methods snippet (copy‑paste)
hardware:
gpu: "<model> (<VRAM> GB)"
driver: "<NVIDIA driver>"
cuda: "<CUDA version>"
software:
image: "<your docking image or CUDA template>"
tool: "AutoDock-GPU <ver> | Vina-GPU <ver> | Uni-Dock <ver>"
inputs:
receptor: "protein.pdbqt"
grid: { center: [<x>,<y>,<z>], size: [<sx>,<sy>,<sz>] }
ligands: "N=<…> from <path>"
run:
batch:
script: "dock_batch.sh"
N_PAR: <int>
flags: "<exhaustiveness, seed, etc.>"
outputs:
ligands_per_hour: "<…>"
wall_hours: "<…>"
cost_per_10k: "<…>"
failures: "<count> (<file path>)"
Related reading
Try Compute today
Start a GPU instance with a CUDA-ready template (e.g., Ubuntu 24.04 LTS / CUDA 12.6) or your own GROMACS image. Enjoy flexible per-second billing with custom templates and the ability to start, stop, and resume your sessions at any time. Unsure about FP64 requirements? Contact support to help you select the ideal hardware profile for your computational needs.