Cloud GPU VM pricing: what you’re really paying for

People ask “How much does a GPU VM cost?” and hope for a single number. In practice, the cost comes down to two levers you control: the machine you pick, and how long you leave it running.

This article explains how pricing works on Compute in plain terms, what’s included, and the few habits that keep spend predictable. If you’re still deciding whether you even need a VM, start here.

How Compute pricing works

Compute uses prepaid credits. You add credits first, then your balance goes down while an instance is running. The console shows an hourly rate because it’s easy to read, but billing is based on actual runtime, down to the second. Learn more about Billing in Compute.

This also means “one more test” can be cheap if it runs for two minutes, and expensive if you forget the instance overnight. The system isn’t trying to trick you. It’s literal.

What’s included (and what usually isn’t)

When people talk about cloud pricing, they often mean “the hourly number” and forget the rest of the bill. On Compute, the intent is that the price you see covers the basics you need to run: compute, storage, network traffic, as well as key resources like memory and processors.

Many platforms advertise low GPU rates but charge separately for CPU, RAM, memory, processors, and storage, which can increase overall costs.

If you want the exact, canonical wording, treat the docs and pricing page as the source of truth because that’s what gets updated first when something changes. Hidden fees such as data transfer costs, storage fees, and setup charges can quickly add up and should be considered when evaluating total costs.

The two real cost drivers

Hardware choice

GPU VM prices mostly track the GPU model and how many GPUs you attach. Pricing can vary significantly based on the GPU model—such as NVIDIA A100 40GB, A100 80GB, H100, B200, and AMD options—and the cloud provider. More GPUs cost more. More VRAM, GPU memory, and system memory also tend to mean higher price because they come bundled with larger machine sizes. The GPU model and its provisioning are primary drivers of the base hourly rate; for example, current-generation GPUs like NVIDIA H100 can range from $2.10 to $15.00 per hour, while older models like V100 range from $0.14 to $6.25 per hour. The NVIDIA A100 GPU is commonly used for AI workloads and is available at various price points across different cloud providers. The NVIDIA H100 GPU is priced around $10.00 per hour and the NVIDIA B200 GPU is available for around $14.00 per hour in certain configurations.

If you’re sizing for AI/ML, the practical limiter is often VRAM, GPU memory, or memory. AI workloads typically require GPUs with higher VRAM and computational power, such as the NVIDIA H100 or A100. High-end GPUs like NVIDIA H100, A100, and B200 are optimized and purpose built for demanding AI workloads, while general purpose GPUs are suitable for a broader range of tasks. GPU instance types vary widely in terms of memory, GPU memory, and processing power (processors), affecting their suitability for different workloads. High-bandwidth networking between GPUs is crucial for large-scale training and can carry additional costs. The cost of GPU virtual machines is also influenced by dedicated vs. shared GPU access, data transfer fees, storage, and networking.

GPU clusters are often used for large-scale machine learning workloads, and choosing the right configuration is important for both cost and performance. The range of mathematical operations handled by CUDA Cores (processors) and the overall workload should be considered when selecting a GPU instance, especially as GPUs take on a central role in modern computing for AI and scientific workloads. This overview helps you choose without getting lost in specs: GPU virtual machine: what it is and who actually needs one.

Runtime

Runtime is the part people underestimate. If you want to control cost, this is the lever that matters most.

If you run for 12 minutes and 20 seconds, that’s 740 seconds. You pay for 740 seconds of runtime, which is 740/3600 of the hourly rate shown. That’s it. No mystery math.

Data transfer: the hidden cost in GPU VM pricing

Cloud GPU pricing gets tricky when you look beyond the hourly rates. Data transfer costs often surprise AI teams working with large datasets or frequent model updates. These fees can double your cloud bill if you're not careful.

Data transfer pricing differs wildly between providers. Some charge per GB moved in or out of their network. Others include free transfer or unlimited movement within their infrastructure. A cheap GPU rate can cost you thousands extra if you're moving terabytes of training data or results. You need the complete cost picture, including key questions to ask before choosing a distributed compute provider, before choosing a provider..

Here are five ways to control data transfer costs for your AI work:

Know your provider's transfer model. Some include free data movement within their network. Others charge for every GB. Find out what's included before you move large datasets.
Use object storage to centralize data. Store your training data, models, and results in one place. This cuts down on repeated transfers and keeps costs predictable.
Compare total costs, not just instance prices. Look at GPU rates and transfer fees together. A higher hourly rate might save you money if transfer is included or cheaper.
Use reserved capacity when possible. Predictable workloads qualify for reserved instances and usage discounts. These often include better rates for transfer and storage too.
Track and adjust your workflow. Monitor your transfer usage monthly. Batch your uploads, compress data when you can, and reuse datasets already stored in the cloud.

Data transfer costs matter for any team running AI training or inference at scale. Plan ahead and pick the right approach to avoid billing surprises. Evaluate providers on the complete package: pricing, performance, storage, security of certified vs. community compute providers, and the hidden costs that affect your budget..

How to estimate cost quickly (without spreadsheets)

Use the hourly rate shown in the console and convert it based on your runtime.

An hour is 3,600 seconds.

Cost ≈ hourly rate × (seconds running / 3,600)

The total cost is calculated by multiplying the hourly rate by the number of GPUs in your instance and the fraction of the hour used.

If you prefer mental math, convert your run time to a fraction of an hour. Ten minutes is one-sixth of an hour. Thirty minutes is half. The more precise you need to be, the more you’ll end up using the Billing page anyway.

The habits that keep spend under control

Stop anything you’re not using. This sounds obvious, but it’s the biggest cost win. If an instance is stopped, compute billing stops. When you’re finished for the day, terminate it. Idle time in GPU usage leads to paying for running VMs that are not actively processing workloads.

Consider serverless GPU platforms. Serverless GPU platforms like Runpod or Cerebrium offer pay-per-execution models that eliminate costs for idle times, which can be a good option for developers and customers who want to avoid paying for unused resources or who prefer cost-effective GPU cloud platforms for AI and ML..

Start small while you’re debugging. A common mistake is paying for a large GPU setup while you’re still fixing basic environment issues. Do your setup and early tests on a smaller size. Scale up when you know the workflow is real. Both developers and customers, especially SMBs exploring AI trends they can leverage with cloud GPU computing, can benefit from starting with smaller instances and scaling up as needed..

Don’t pay GPU prices for CPU work. A lot of pipelines spend time on downloads, preprocessing, packaging, or serving a lightweight API. If the GPU is idle, you’re paying for a parked sports car. Split CPU-heavy steps to vCPU instances if that fits your workflow. Bare metal instances may be more cost-effective for certain high-performance workloads, but virtual machines offer more flexibility for most developers who want scalable access to GPUs in modern computing through distributed cloud platforms..

Treat “stop” as a pause, not storage. Stopping is great for short breaks and quick restarts, but don’t assume a stopped instance is a long-term archive. If you need to keep an environment, back up what matters and plan for rebuilds. This explainer is meant to prevent unpleasant surprises: Does a VM keep my changes? Persistence on Compute explained.

Keep an eye on your balance if you run long jobs. Because credits are prepaid, a long run can end early if your balance can’t cover more runtime. The best fix is simple: top up before you start, or enable auto top-up so you don’t have to babysit it.

A few pricing questions people search for

Do you bill per second or per hour?

The UI shows hourly rates, but billing is per-second.

Do I pay when the VM is stopped?

Compute charges apply while the instance is running. If you stop or terminate it, compute charges stop.

Is there a minimum spend?

You typically need enough credit to start the configuration you choose. If you’re low, top up or pick a smaller setup.

What’s the best way to lower cost?

Right-size the hardware and stop the instance the moment you’re not using it. Everything else is second-order.

Try Compute

If you want the simplest “see what it costs” approach, launch a small instance, run a short test, then check the Billing page. You’ll learn more from one real run than from any pricing theory.

‍

← Back