AutoDock‑GPU / Vina‑compatible GPU docking at scale

Docking is throughput work. GPUs make sense when you prepare clean inputs and run many replicas in parallel. This guide shows how to run GPU docking, batch thousands of ligands, and track cost per 10k ligands.

What we’ll cover

Picking a CUDA‑ready template on your GPU renter and bringing the docking binaries
Minimal receptor/ligand prep that doesn’t fall apart mid‑run
A batch launcher that fills the GPU without babysitting
A self‑benchmark harness and cost math you can copy
VRAM, file I/O, and common error traps

Start in seconds with the fastest, most affordable cloud GPU clusters.

Launch an instance in under a minute. Enjoy flexible pricing, powerful hardware, and 24/7 support. Scale as you grow—no long-term commitment needed.

Try Compute now

1) Choose your image

On most GPU renters your job runs inside an image. Two paths that work well:

A) Use a CUDA template + mount docking tools (simple)

Template: Ubuntu 24.04 LTS (CUDA 12.6)
Install or mount AutoDock‑GPU / Vina‑compatible GPU binaries into the container at runtime (keep them outside the image).

B) Bring your own image (fastest to reuse)

Build a private image that contains your docking tools + Python stack for prep.
Add env vars:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
Entry script prints versions, creates /work, and waits (or starts the batch).

You do not need Docker‑in‑Docker in general as your provider passes the host driver into your container.

2) Minimal input prep that’s reproducible

Directory layout

/work ├── receptor/ │ ├── protein.pdbqt │ └── grid/ # maps & config ├── ligands/ │ ├── L000001.pdbqt │ ├── L000002.pdbqt │ └── ... └── jobs/ └── (outputs here)

Receptor

Prepare once with your standard pipeline. Save PDBQT and grid box/config files used by your docking tool.
Keep a README with center/size and any protonation choices.

Ligands

Convert to PDBQT in a separate prep step (SMILES/SDF → PDBQT). Keep a deterministic script that sets protonation, tautomers, and charges the same way every time.
Validate by docking a tiny panel first (10–20 ligands) and reviewing poses/scores.

3) Batch launcher (single GPU)

This skeleton runs ligands in small chunks, keeps the GPU fed, and writes logs you can parse later. Replace the DOCK_CMD with your tool’s exact call (AutoDock‑GPU, Vina‑GPU, Uni‑Dock, etc.).

#!/usr/bin/env bash set -euo pipefail LIG_DIR=${1:-ligands} OUT_DIR=${2:-jobs} N_PAR=${3:-8} # concurrent processes per GPU (tune) mkdir -p "$OUT_DIR" # Edit this to your docking command. Example placeholders: # autodock_gpu --lrigid receptor/protein.pdbqt --ffile receptor/grid/maps.fld \ # --lfile <ligand.pdbqt> --center_x ... --center_y ... --center_z ... --size_x ... --size_y ... --size_z ... \ # --exhaustiveness 8 --out <out.pdbqt> --log <out.log> DOCK_CMD() { :; } export -f DOCK_CMD ls "$LIG_DIR"/*.pdbqt | awk '{print NR, $0}' | while read idx lig; do base=$(basename "$lig" .pdbqt) out="$OUT_DIR/$base" mkdir -p "$out" ( DOCK_CMD "$lig" "$out" >"$out/run.log" 2>&1 || echo "FAIL $base" >> "$OUT_DIR/failures.txt" ) & # simple throttle if (( $(jobs -rp | wc -l) >= N_PAR )); then wait -n; fi done wait echo "Done. Outputs in $OUT_DIR/"

Tuning N_PAR

Start with 4–8 processes per GPU, then raise until utilization hovers around ~90% without thrashing VRAM.
Watch nvidia-smi for memory and utilization.

4) Multi‑GPU scale‑out

If your instance has N GPUs, launch one batch per GPU and pin each process group:

# GPU 0 CUDA_VISIBLE_DEVICES=0 bash dock_batch.sh ligands jobs/gpu0 8 & # GPU 1 CUDA_VISIBLE_DEVICES=1 bash dock_batch.sh ligands jobs/gpu1 8 & wait

Keep outputs separated by GPU for easier auditing.

5) Self‑benchmark & cost math

Record only what matters:

inputs: N_ligands, receptor, grid (center/size), exhaustiveness/seed hardware: GPU model/VRAM, CUDA, driver code: docking tool version + flags metrics: ligands/hour, failures, wall_hours

Cost per 10k ligands

cost_per_10k = price_per_hour × wall_hours × (10_000 / N_ligands)

Report ligands/hour and cost per 10k per GPU. Keep the full command line in your Methods.

6) Practical knobs that move throughput

Exhaustiveness / search steps: biggest lever. Find the lowest setting that passes your early‑enrichment check.
Batch size (N_PAR): raise until VRAM or context switches stall you. Small molecules → higher N_PAR.
I/O: write only what you need. Avoid chatty per‑pose logs when screening millions.
Seeds: fix seeds for comparability across runs; randomize for the final screen if you prefer.

7) Troubleshooting

GPU OOM
Lower N_PAR, reduce exhaustiveness, or move to a larger‑VRAM GPU.

All jobs fail fast
Bad grid or incompatible flags. Re‑run a single ligand with verbose logging and compare to a known‑good CPU run.

Weird scores or poses
Input prep differences (protonation/tautomers/charges). Normalize your prep and test on a small curated set.

GPU idle
N_PAR too low or tool has long CPU pre/post stages. Parallelize prep or pre‑stage grids.

Methods snippet (copy‑paste)

hardware: gpu: "<model> (<VRAM> GB)" driver: "<NVIDIA driver>" cuda: "<CUDA version>" software: image: "<your docking image or CUDA template>" tool: "AutoDock-GPU <ver> | Vina-GPU <ver> | Uni-Dock <ver>" inputs: receptor: "protein.pdbqt" grid: { center: [<x>,<y>,<z>], size: [<sx>,<sy>,<sz>] } ligands: "N=<…> from <path>" run: batch: script: "dock_batch.sh" N_PAR: <int> flags: "<exhaustiveness, seed, etc.>" outputs: ligands_per_hour: "<…>" wall_hours: "<…>" cost_per_10k: "<…>" failures: "<count> (<file path>)"

Try Compute today

Start a GPU instance with a CUDA-ready template (e.g., Ubuntu 24.04 LTS / CUDA 12.6) or your own GROMACS image. Enjoy flexible per-second billing with custom templates and the ability to start, stop, and resume your sessions at any time. Unsure about FP64 requirements? Contact support to help you select the ideal hardware profile for your computational needs.

‍

← Back