← Back

GPU geospatial with RAPIDS cuSpatial: spatial joins & point‑in‑polygon

Big point sets against big polygons used to mean “come back tomorrow.” With cuSpatial you can run those joins in minutes—if your stack is set up right. This guide shows a clean path, plus code you can paste.

What we’ll cover

  • Picking a CUDA‑ready template (or a RAPIDS image)
  • Getting cuDF/cuSpatial ready without yak‑shaving
  • Two core patterns: point‑in‑polygon and point‑in‑polygon at scale (with spatial filtering)
  • A small self‑benchmark harness
  • VRAM, I/O, and projection pitfalls

Start in seconds with the fastest, most affordable cloud GPU clusters.

Launch an instance in under a minute. Enjoy flexible pricing, powerful hardware, and 24/7 support. Scale as you grow—no long-term commitment needed.

Try Compute now

1) Choose your image

On GPU renters your job runs inside an image. Two good options:

A) Use a RAPIDS image (fastest)

  • Registry (image): docker.io/rapidsai/rapidsai:<24.xx>-cuda12-runtime-ubuntu22.04-py3.10
  • Env vars:
    • NVIDIA_VISIBLE_DEVICES=all
    • NVIDIA_DRIVER_CAPABILITIES=compute,utility
  • CMD: keep your startup script or an interactive shell.

B) Use a CUDA template and install RAPIDS with micromamba

# one‑time in a fresh instance
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
mkdir -p ~/micromamba && ./bin/micromamba shell init -s bash -p ~/micromamba
source ~/.bashrc
micromamba create -y -n rapids -c rapidsai -c conda-forge rapids=24.* python=3.10
micromamba activate rapids
python -c "import cudf, cuspatial; print(cudf.__version__, cuspatial.__version__)"

Either path gives you cudf + cuspatial with the CUDA user‑space ready. The host driver is usually supplied by your computing services provider.

2) Data formats that play well

  • GeoParquet for points and polygons. It’s columnar and friendly to cuDF.
  • Shapefile works, but prefer converting to GeoParquet once.
  • CRS: reproject to a metric CRS (e.g., UTM) before distance/area math. Keep everything in the same CRS.

3) Point‑in‑polygon (PIP): the basic pattern

This shows the in‑memory API with a toy polygon. Swap in real data later.

import cudf
import cupy as cp
import cuspatial

# Points (N x 2)
N = 1_000_000
points = cudf.DataFrame({
   "x": cp.random.random(N),
   "y": cp.random.random(N),
})

# One square polygon (closed ring)
poly_offsets   = cudf.Series([0], dtype="int32")
ring_offsets   = cudf.Series([0], dtype="int32")
poly_x         = cudf.Series([0, 1, 1, 0, 0], dtype="float64")
poly_y         = cudf.Series([0, 0, 1, 1, 0], dtype="float64")

mask = cuspatial.point_in_polygon(
   points["x"], points["y"],
   poly_offsets, ring_offsets,
   poly_x, poly_y,
)
# mask is a bool DataFrame (rows=points, cols=polygons)
inside = points[mask.iloc[:, 0]]
print(len(inside))

Notes

  • For multiple polygons, mask has one column per polygon. Use .any(axis=1) if you only care whether a point fell in any polygon.
  • Use float64 for polygon coordinates if your domain needs it; points can be float32/64.

4) PIP at scale: filter then test

Real workloads use thousands of polygons and hundreds of millions of points. Test fewer candidates by filtering with bounding boxes first, then call point_in_polygon just on those.

import cudf, cupy as cp, cuspatial

# Assume points and polygons are already loaded as cudf
# Example polygon structure (multiple polys, rings):
# poly_offsets, ring_offsets, poly_points_x, poly_points_y

# 1) Build polygon bounding boxes (minx, maxx, miny, maxy)
boxes = cuspatial.polygon_bounding_boxes(
   poly_offsets, ring_offsets, poly_x, poly_y
)
# boxes: DataFrame [min_x, min_y, max_x, max_y]

# 2) Quick candidate filter: points within any bbox
cond = (
   (points.x >= boxes.min_x.min()) & (points.x <= boxes.max_x.max()) &
   (points.y >= boxes.min_y.min()) & (points.y <= boxes.max_y.max())
)
pts_cand = points[cond]

# 3) Exact test only on candidates
mask = cuspatial.point_in_polygon(
   pts_cand.x, pts_cand.y,
   poly_offsets, ring_offsets, poly_x, poly_y
)
any_hit = mask.any(axis=1)
joined = pts_cand[any_hit]

Why this helps
Bounding‑box filtering reduces the number of expensive PIP tests. For extreme scales, look at cuSpatial’s quadtree utilities to prune candidates even harder.

5) Reading real data (GeoParquet)

import cudf

points = cudf.read_parquet("sensors_utm.parquet")  # cols: x, y, id
poly   = cudf.read_parquet("zones_utm.parquet")     # stored as exploded rings

# Build polygon arrays expected by cuSpatial
poly_offsets = poly["poly_offset"].astype("int32")
ring_offsets = poly["ring_offset"].astype("int32")
poly_x       = poly["x"].astype("float64")
poly_y       = poly["y"].astype("float64")

If your polygons live in shapefiles, convert once to GeoParquet (off‑GPU is fine) to speed up future loads.

6) Dask for bigger‑than‑GPU jobs (optional)

For datasets that don’t fit in one GPU, use Dask to partition work. Pattern:

  • Partition points across workers/GPUs.
  • Broadcast polygons (usually smaller) to workers.
  • Filter candidates per partition, then PIP.

The code mirrors the single‑GPU version but wraps cuDF DataFrames in Dask cuDF.

7) Self‑benchmark harness

Keep it boring and comparable.

inputs: N_points, N_polygons, polygon vertices, CRS, precision
hardware: GPU model/VRAM, CUDA, driver
code: cuSpatial version, exact code path (with/without filter)
metrics: seconds for load → filter → PIP, peak VRAM

Compute cost per million points:

cost_per_million = (price_per_hour × wall_seconds / 3600) / (N_points / 1e6)

8) VRAM & performance tips

  • VRAM: keep an eye on nvidia-smi. If close to full, chunk the point table.
  • CRS: reproject once to a metric CRS before distance/area ops; store that in GeoParquet.
  • I/O: read Parquet from local NVMe. For cloud buckets, use larger row groups and snappy/zstd.
  • Precision: float32 is faster and smaller. Use float64 only if validation requires it.
  • Logging: time each stage (load, filter, PIP) separately; it helps you see bottlenecks.

9) Troubleshooting

CUDA error / no GPU
Check nvidia-smi inside the container. Make sure your image is CUDA‑ready and you set the NVIDIA env vars.

MemoryError or OOM
Chunk the point table; filter early with bboxes; reduce column set.

Wrong results
Mismatched CRS. Reproject and retest. Confirm ring orientation and closed polygons.

Slow loads
Move data to local NVMe; switch to GeoParquet; increase Parquet row‑group size.

Methods snippet (copy‑paste)

hardware:
 gpu: "<model> (<VRAM> GB)"
 driver: "<NVIDIA driver>"
 cuda: "<CUDA version>"
software:
 image: "rapidsai/rapidsai:<24.xx>-cuda12-runtime-ubuntu22.04-py3.10"
 python: "3.10"
 libs:
   - cudf: "<version>"
   - cuspatial: "<version>"
inputs:
 points: "s3://…/sensors_utm.parquet (N=<…>)"
 polygons: "s3://…/zones_utm.parquet (polys=<…>)"
run:
 script: "pip_join.py"
 notes: "bbox filter → PIP; CRS=EPSG:32633"
outputs:
 wall_seconds: "<…>"
 cost_per_million: "<…>"

Related reading

Try Compute today

Start a GPU instance with a CUDA-ready template (e.g., Ubuntu 24.04 LTS / CUDA 12.6) or your own GROMACS image. Enjoy flexible per-second billing with custom templates and the ability to start, stop, and resume your sessions at any time. Unsure about FP64 requirements? Contact support to help you select the ideal hardware profile for your computational needs.

← Back