FLAME GPU starter: from NetLogo to millions of agents on one GPU

Agent‑based models thrive on parallelism. FLAME GPU executes agent functions as CUDA kernels, so one workstation‑class GPU can simulate millions of agents in real time—if you structure the model cleanly. This guide shows a practical, GPU‑friendly path from NetLogo/Mesa to FLAME GPU.

What we’ll cover

Choosing a CUDA‑ready template and getting FLAME GPU built
Minimal project layout that’s easy to maintain
A tiny agent function skeleton and messaging pattern
Validation against your CPU model
A simple throughput benchmark (agents/s and cost per million agent‑steps)
Profiling and common bottlenecks

Precision note: most ABMs are fine in FP32. If your model is sensitive, see the FP64 checklist.

Start in seconds with the fastest, most affordable cloud GPU clusters.

Launch an instance in under a minute. Enjoy flexible pricing, powerful hardware, and 24/7 support. Scale as you grow—no long-term commitment needed.

Try Compute now

1) Pick your image on your GPU service

Your job runs inside a container. Two options that work well:

A) CUDA runtime + build from source (portable)

Template: Ubuntu 24.04 LTS (CUDA 12.6)
Add build tools and Python if you’ll use the Python API.

# Dockerfile (sketch) FROM nvidia/cuda:12.4.1-runtime-ubuntu22.04 ARG DEBIAN_FRONTEND=noninteractive RUN apt-get update && apt-get install -y --no-install-recommends \ build-essential cmake git python3 python3-pip \ && rm -rf /var/lib/apt/lists/* ENV NVIDIA_VISIBLE_DEVICES=all \ NVIDIA_DRIVER_CAPABILITIES=compute,utility

B) Use a maintained FLAME GPU image (fastest)

Point your template to your lab’s private image that already includes FLAME GPU and dependencies.

Either way, confirm GPU visibility inside the container:

nvidia-smi

2) Project layout

/work ├── CMakeLists.txt ├── src/ │ ├── agents.cu # agent functions │ ├── model.cu # model description & layers │ └── main.cu # entry point ├── python/ # optional Python driver └── data/ # inputs, seeds, checkpoints

Initialize CMake and build out‑of‑tree:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release cmake --build build -j

3) Agent function + messaging (skeleton)

FLAME GPU runs agent functions over agent arrays on the GPU. Use messages for local interactions.

// agents.cu #include <flamegpu/flamegpu.h> // Each agent reads messages in its neighborhood and updates velocity FLAMEGPU_AGENT_FUNCTION(step, flamegpu::MessageSpatial2D, flamegpu::MessageSpatial2D) { const float x = FLAMEGPU->getVariable<float>("x"); const float y = FLAMEGPU->getVariable<float>("y"); float vx = FLAMEGPU->getVariable<float>("vx"); float vy = FLAMEGPU->getVariable<float>("vy"); // simple cohesion float cx = 0.f, cy = 0.f; int n = 0; for (const auto &m : FLAMEGPU->message_in(x, y)) { // spatial iteration cx += m.getVariable<float>("x"); cy += m.getVariable<float>("y"); n++; } if (n) { cx /= n; cy /= n; vx += 0.05f * (cx - x); vy += 0.05f * (cy - y); } // write output state and position FLAMEGPU->setVariable<float>("vx", vx); FLAMEGPU->setVariable<float>("vy", vy); FLAMEGPU->setVariable<float>("x", x + vx); FLAMEGPU->setVariable<float>("y", y + vy); // emit message for neighbors next step FLAMEGPU->message_out.setVariable<float>("x", x); FLAMEGPU->message_out.setVariable<float>("y", y); return flamegpu::ALIVE; }

Model description (pattern)

// model.cu #include <flamegpu/flamegpu.h> using namespace flamegpu; ModelDescription model("abm"); AgentDescription agent = model.newAgent("A"); agent.newVariable<float>("x"); agent.newVariable<float>("y"); agent.newVariable<float>("vx"); agent.newVariable<float>("vy"); MessageSpatial2D::Description msg(model); msg.setBounds(0, 100, 0, 100); // domain bounds msg.setRadius(1.0f); // interaction radius msg.newVariable<float>("x"); msg.newVariable<float>("y"); LayerDescription layer = model.newLayer("L"); layer.addAgentFunction(step);

This mirrors common NetLogo patterns (turtles + vision radius) but in a GPU‑friendly, structure‑of‑arrays layout.

4) Run, seed, and checkpoint

./build/abm --agents 5_000_000 --steps 1000 --seed 42 \ --output data/checkpoints --checkpoint-interval 100

Keep deterministic seeds for validation.
Write fewer, larger checkpoints to reduce I/O overhead.

5) Validate vs your CPU model

Pick a small world and identical rules.
Run CPU reference (NetLogo/Mesa) and GPU for a short horizon.
Compare aggregate metrics: counts, mean positions, cluster sizes, or domain‑specific stats.
Differences should sit within stochastic variance for the same seed.

If results diverge, check message radius, boundary conditions, and update order.

6) Throughput + cost benchmark

Use numbers that matter for planning.

metrics: agents: <N> steps: <T> wall_seconds: <…> agent_steps_per_second: N*T / wall_seconds cost_per_million_agent_steps: (price_per_hour * wall_seconds/3600) / 1e6 * (N*T)

Log GPU model/VRAM, driver, CUDA, FLAME GPU version, and the exact command line.

7) Profiling & bottlenecks

nvidia-smi: watch utilization and VRAM.
nsys / ncu: identify kernels with low occupancy or uncoalesced access.
Messaging: spatial messages scale better than brute‑force all‑pairs; keep radii realistic.
Host↔Device copies: avoid per‑step transfers; batch outputs.
Branching: split agents by state into separate functions/layers when branches are hot.

8) Troubleshooting

GPU idle
Too few agents or heavy host‑side work. Raise N, reduce per‑step I/O, or move setup into device code.

Out of memory
Slim agent variables, chunk outputs, or choose a larger‑VRAM profile.

Non‑deterministic results
Fix seeds and avoid unordered host‑side reductions. Document RNG.

Build errors
Match CUDA to your base image and CMake toolchain. Clean and rebuild.

Methods snippet (copy‑paste)

hardware: gpu: "<model> (<VRAM> GB)" driver: "<NVIDIA driver>" cuda: "<CUDA version>" software: flamegpu: "<version>" image: "Ubuntu 24.04 LTS (CUDA 12.6)" model: domain: "[0,100]x[0,100]" agents: <N> rules: "cohesion-only demo" run: cmd: "./build/abm --agents <N> --steps <T> --seed 42" checkpoints: "every 100 steps" outputs: wall_seconds: "<…>" agent_steps_per_second: "<…>" cost_per_million_agent_steps: "<…>"

Try Compute today

Start a GPU instance with a CUDA-ready template (e.g., Ubuntu 24.04 LTS / CUDA 12.6) or your own GROMACS image. Enjoy flexible per-second billing with custom templates and the ability to start, stop, and resume your sessions at any time. Unsure about FP64 requirements? Contact support to help you select the ideal hardware profile for your computational needs.

← Back