The old device plugin model sees a GPU as a number. It cannot tell you whether that GPU is an A100 with 40 GB or an H100 with 80 GB. It cannot split the GPU between two jobs. It cannot mark one card for maintenance without draining the whole node. If you're paying for GPU time, that's real money going to waste.
Kubernetes 1.36 (released April 22, 2026) is the release where the full Dynamic Resource Allocation stack stabilizes. DRA core graduated to GA in 1.34. What 1.36 adds is the satellite features that make GPU scheduling actually usable in production: AdminAccess, Prioritized Alternatives, and PodResources Extension all reach GA. Device taints and partitionable devices land as Beta with default-on gates.
Here's what that means for a platform team running GPU workloads.
What device plugins get wrong
Device plugins register resources as simple counts: nvidia.com/gpu: 4. The scheduler sees four GPUs. It knows nothing else.
That model breaks in five concrete ways:
- No sharing: one pod, one GPU, always. An inference job needing 2 GB gets a 40 GB card to itself.
- No attribute matching: you cannot request "an Ampere card with at least 16 GB." You get whatever the count says.
- No fallback: if H100s are full, the job queues. It does not fall back to A100s automatically.
- No per-device health: a degraded card is either schedulable or it isn't. There's no middle state.
- No per-device maintenance: marking one GPU for service means draining the entire node.
DRA replaces all of this with a proper API layer.
The four API objects
DeviceClass
The cluster administrator creates DeviceClasses. They work like StorageClasses: define a category of hardware with optional CEL-based filters. Workload authors reference them by name.
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: gpu-high-memory
spec:
selectors:
- cel:
expression: >
device.driver == 'gpu.nvidia.com' &&
device.capacity["gpu.nvidia.com"].memory >= quantity("16Gi")ResourceSlice
The DRA driver publishes a ResourceSlice per node. It describes every device with full attributes and updates in real time as health status changes.
apiVersion: resource.k8s.io/v1
kind: ResourceSlice
metadata:
name: node1-nvidia-gpu-0
spec:
nodeName: node1
driver: gpu.nvidia.com
pool:
name: node1-gpus
generation: 1
resourceSliceCount: 1
devices:
- name: gpu-0
basic:
attributes:
model:
string: "A100-SXM4-40GB"
architecture:
string: "Ampere"
cudaComputeCapability:
string: "8.0"
driverVersion:
string: "525.85.12"
capacity:
memory: "40Gi"
multiprocessors: "108"This is what DRA knows that device plugins never did. The scheduler can match against any of these fields.
ResourceClaim
Workload authors write ResourceClaims to request specific hardware. The request uses CEL expressions against the ResourceSlice attributes.
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
name: my-gpu-claim
spec:
devices:
requests:
- name: primary-gpu
exactly:
deviceClassName: gpu-high-memory
selectors:
- cel:
expression: >
device.attributes["gpu.nvidia.com"].architecture == "Ampere" &&
device.capacity["gpu.nvidia.com"].memory >= quantity("16Gi")ResourceClaimTemplate
For Jobs and Deployments where each pod needs its own dedicated resource, use a ResourceClaimTemplate. Kubernetes creates one ResourceClaim per pod automatically; the lifecycle follows the pod.
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: per-pod-gpu
spec:
spec:
devices:
requests:
- name: gpu
exactly:
deviceClassName: gpu-high-memoryWiring a ResourceClaim into a pod spec:
apiVersion: v1
kind: Pod
metadata:
name: ml-training-job
spec:
resourceClaims:
- name: gpu
resourceClaimTemplateName: per-pod-gpu
containers:
- name: trainer
image: nvcr.io/nvidia/pytorch:24.03-py3
command: ["python", "train.py"]
resources:
claims:
- name: gpuDevice plugins vs DRA
| Dimension | Device plugins | DRA |
|---|---|---|
| Resource model | Simple count (N devices) | Full objects with attributes |
| Sharing | No | Yes (partitionable devices) |
| Attribute-based selection | No | Yes (CEL expressions) |
| Fallback / priority | No | Yes (Prioritized Alternatives) |
| Per-device maintenance | No (requires node drain) | Yes (device taints) |
| Health status | Limited | Real-time via ResourceSlice |
| Observability | Limited | PodResources Extension gRPC API |
| API stability | Stable | GA since 1.34, complete in 1.36 |
| Driver complexity | Simple daemonset | DRA-compatible driver (more complex) |
What's GA in 1.36
| Feature | Status | What it gives you |
|---|---|---|
| DRA core (ResourceClaim, DeviceClass, ResourceSlice) | GA | Stable for production use |
| AdminAccess | GA | Privileged device access for admin tooling and debug |
| Prioritized Alternatives | GA | Fallback chains: H100 then A100 then T4 |
| PodResources Extension | GA | gRPC API exposing exactly which hardware the pod got |
| Partitionable Devices | Beta, default on | GPU sharing via MIG |
| Device Taints | Beta, default on | Per-device maintenance marking |
| Downward API | Alpha | Expose allocated device info to pod env vars |
The two relevant feature gates in 1.36 are DRADeviceTaints (Beta, default: true) and DRAPartitionableDevices (Beta, default: true). DynamicResourceAllocation itself is stable and locked since 1.35; it is not a configurable gate in 1.36 clusters.
There is also a second gate to know: DRADeviceTaintRules (Beta, default: false) controls a separate capability for admin-defined taint rules applied across devices. It requires explicit activation and is distinct from DRADeviceTaints.
GPU partitioning with MIG
NVIDIA MIG (Multi-Instance GPU) is the reference implementation for DRAPartitionableDevices. An A100 with 40 GB running the all-balanced profile can be split into partitions like:
- 2x
1g.5gbfor light inference - 1x
2g.10gbfor mid-size workloads - 1x
3g.20gbfor heavier training
On an 8-GPU node, that's 32 independently schedulable slices with hardware isolation: dedicated compute units, memory controllers, and cache per slice.
Each MIG partition is published as a separate device in ResourceSlice with its own attributes. The scheduler matches ResourceClaims against partitions, not physical GPUs. Allocation happens at scheduling time, not at runtime. That eliminates the race conditions that device plugins deal with via mutex locks and retry loops.
One practical catch: if you run the all-balanced profile (heterogeneous partitions), GPU Operator must be configured with mig.strategy: "mixed". Without it, GPU Feature Discovery marks the devices as invalid and they don't appear to the scheduler.
On AKS, MIG via DRA is confirmed working as of March 2026. On EKS, DRA was enabled with Kubernetes 1.33 (May 2025). Both managed services use NVIDIA's k8s-dra-driver component for MIG support, which ships with GPU Operator 25.x integration. This is separate from the device plugin shipped with earlier GPU Operator versions; do not assume DRA-level MIG support from GPU Operator 24.x.
Device taints
Device taints work like node taints but at the device level. A DRA driver can put a taint on a single GPU without touching the node. Pods without a matching toleration won't schedule to that device.
Practical use: rolling hardware maintenance on a live GPU cluster. Mark gpu-0 with maintenance=true:NoSchedule. Running workloads drain gradually as they complete. Service the card. Remove the taint. The node stays up throughout.
For distributed training jobs, a node drain is expensive. Aborting a multi-day training run costs real money and time. Per-device taints make planned maintenance possible without that cost.
What breaks before you ship this to prod
ResourceClaims are immutable. Once created, you cannot modify a claim. If your workload's hardware requirements change, you create a new claim. This is intentional (allocation integrity) but means your job manifests need to be rebuilt, not patched.
Devices must be co-located with the pod. There is no cross-node device sharing. If your job needs two GPUs on different nodes, DRA cannot help with that topology. Plan your node sizing accordingly.
DRA drivers are more complex than device plugins. Writing a DRA-compatible driver requires implementing the full ResourceSlice publication loop, health reporting, and CEL attribute schema. Third-party hardware support will lag behind the ecosystem maturity. AMD's ROCm DRA driver is under active development but not production-ready as of Q2 2026.
Managed service support is not uniform. AKS and EKS have DRA with MIG. GKE's GA status with DRA is not confirmed at the same level. Check your specific provider version and their DRA documentation before building on this assumption.
NVIDIA's own documentation still flags caveats. As of late 2025, NVIDIA's published documentation described DRA as "not yet supported for production use" in some sections. That status is evolving rapidly with GPU Operator 25.x, but verify against current NVIDIA documentation before committing to DRA in a production GPU cluster.
The mechanics are solid. The ecosystem around the mechanics is still catching up.