Serve AI models faster than they can think

Spin up a GPU in seconds. Keep first-token time low and tokens per second high. Pay only for the time you use.

Launch an instance

Why teams choose Compute for inference

Managed inference with vLLM

Start serving in minutes using our vLLM template.

All-inclusive pricing

No egress charges. Per-second billing with prepaid credits and optional auto top-up.

Flexible networking

HTTPS, TCP, or UDP. Expose the ports your service needs.

In-region runs

France and UAE clusters keep traffic close to your users.

How it works

Pick a 4090 or 5090 tier.

Launch from a clean PyTorch or vLLM image.

Enable networking (HTTPS/TCP/UDP) and point your app to the endpoint.

Save it as a custom template for next time.

Most models sit idle for long stretches.

Pay for the minutes you use, not the hours you don’t.

Launch an instance

Performance snapshot

From our benchmark:

Dual RTX 5090 reached 7,604 tokens/second,‍

with ~45 ms time-to-first-token on Llama-3.1-8B.

Read the full 5090 benchmarks →

Pricing at a glance

On-demand GPUs, billed per second via prepaid credits

All-inclusive: compute, storage, and data transfer included

RTX 5090

€ 0.40 - 3.20 /h

→

1 × - 8 ×

VRAM 32 - 336 GB

RAM 73 - 584 GB

CPU 8 - 64

Disk space 250 - 2000 GB

Bandwidth 1000 Mb/s

RTX 4090

€ 0.20 - 1.60 /h

→

1 × - 8 ×

VRAM 24 - 192 GB

RAM 48 - 384 GB

CPU 8 - 64

Disk space 250 - 2000 GB

Bandwidth 125 - 1000 Mb/s

Compare all tiers →Contact sales

GPUs are on-demand today. Spot capacity is coming soon.

What people run on Compute

Conversational AI for support and tutoring

LLM endpoints tuned for apps and APIs

Voice models for real-time transcription or captions

FAQ

Do you have any questions?

Do you support vLLM?

Yes. Use the vLLM template to serve models quickly.

Can I keep the service behind HTTPS?

Yes. HTTPS is available alongside TCP and UDP.

Can I pause my instance?

Yes. Stop/Start is available without extra fees for a limited time. See details on the blog.

Which regions are live?

France and UAE.

Do you store my inputs or outputs?

No. Logs and data stay in your instance unless you choose to persist.

Where does my data live?

Runs stay in the region you choose.