Skip to main content

Frame Graph: Production Engines

·10 mins
Frame Graph - This article is part of a series.
Part : This Article
📖 Part IV of IV.TheoryBuild ItBeyond MVPProduction Engines

Part III showed how the compiler can go further: async compute and split barriers. But our MVP still lives in a vacuum: one thread, one queue, resources that vanish between frames.

Production renderers operate at a different scale entirely. They run 700+ passes, record commands across a thread pool, pool heaps that persist for the lifetime of the application, and integrate graph-managed code alongside rendering paths that exist outside the graph.

This article cracks open UE5’s RDG and Frostbite’s FrameGraph to see how they bridge that gap, then maps out the concrete steps from MVP to production.


Declare: Pass & Resource Registration
#

Every engine starts the same way: passes declare what they read and write, resources are requested by description, and the graph accumulates edges. The differences are in how that declaration happens.

UE5 RDG
#

Each AddPass takes a parameter struct + execute lambda. The struct is the setup phase: macros generate metadata, RDG extracts dependency edges:

BEGIN_SHADER_PARAMETER_STRUCT(...)
  SHADER_PARAMETER_RDG_TEXTURE(Input)
  RENDER_TARGET_BINDING_SLOT(Output)
END_SHADER_PARAMETER_STRUCT()
read edge
write edge ←  → DAG
Macro generates metadata → RDG extracts dependency edges. No separate setup lambda needed.

Pass flags control queue and behavior: ERDGPassFlags::Raster, ::Compute, ::AsyncCompute, ::NeverCull, ::Copy. Resources are either transient (CreateTexture: graph-owned, eligible for aliasing) or imported (RegisterExternalTexture: externally owned, barriers tracked but no aliasing).

Pass Flags
ERDGPassFlags::Raster: Graphics queue, render targets
ERDGPassFlags::Compute: Graphics queue, compute dispatch
ERDGPassFlags::AsyncCompute: Async compute queue
ERDGPassFlags::NeverCull: Exempt from dead-pass culling
ERDGPassFlags::Copy: Copy queue operations
ERDGPassFlags::SkipRenderPass: Raster pass that manages its own render pass
Resource Types
FRDGTexture / FRDGTextureRef: Render targets, SRVs, UAVs
FRDGBuffer / FRDGBufferRef: Structured, vertex/index, indirect args
FRDGUniformBuffer: Uniform/constant buffer references
Created via CreateTexture() (transient) or RegisterExternalTexture() (imported)

Frostbite
#

Frostbite’s GDC 2017 talk described a similar split: a setup lambda declares reads and writes, and an execute lambda records GPU commands. The exact current implementation isn’t public.

What’s different from our MVP
#

Declaration aspectOur MVPProduction engines
Edge declarationExplicit read() / write() / readWrite() calls in setup lambdaUE5: macro-generated metadata. Frostbite: lambda-based, similar to MVP.
Resource creationTransient (CreateResource) + imported (ImportResource), imported tracked for barriers but not aliasedSame distinction, plus cross-frame heap pooling, placed sub-allocation, and size bucketing.
Queue assignmentSingle queuePer-pass flags: graphics, compute, async compute, copy
RebuildFull rebuild every frameUE5: hybrid (cached topology, invalidated on change or explicit engine events). Others: dynamic rebuild.

Compile: The Graph Compiler at Scale
#

This is where production engines diverge most from our MVP. The compile phase runs entirely on the CPU, between declaration and execution. Our MVP does five things here: topo-sort, cull, scan lifetimes, alias, and compute barriers. Production engines do the same five, plus async compute scheduling, split barrier placement, and barrier batching.

MVP compile
├ topo-sort
├ cull dead passes
├ scan lifetimes
├ alias memory
└ compute barriers
Production compile
├ topo-sort
├ cull dead passes
├ scan lifetimes
├ alias memory + cross-frame pooling
schedule async compute
├ compute barriers + split begin/end
batch barriers

Every step below is a compile-time operation. No GPU work, no command recording. The compiler sees the full DAG and makes optimal decisions the pass author never has to think about.

Pass culling
#

Same algorithm as our MVP (backward reachability from the output), but at larger scale. UE5 uses refcount-based culling and skips allocation entirely for culled passes (saves transient allocator work). Culled passes never execute, never allocate resources, never emit barriers. They vanish as if they were never declared.

Memory aliasing
#

Both engines use the same core algorithm from Part II: lifetime scanning + free-list allocation. The production refinements:

RefinementUE5 RDGFrostbite (GDC talk)
Placed resourcesTransient allocator (r.RDG.TransientAllocator) binds into ID3D12Heap offsetsHeap sub-allocation
Size bucketingPower-of-two in transient allocatorCustom bin sizes
Cross-frame poolingPersistent pool, peak-N-frames sizingHeaps persisted across frames, reallocated only when peak demand grew, using the same high-water-mark strategy most engines use
Imported aliasing transient onlyDescribed as supported for resources whose lifetimes are fully known within the frame

Our MVP allocates fresh each frame. Production engines pool across frames: once a heap is allocated, it persists and gets reused. UE5’s transient allocator (controlled via r.RDG.TransientAllocator) tracks peak usage over several frames and only grows the pool when needed. Frostbite described the same pattern at GDC 2017: heaps survive across frames, the allocator remembers the high-water mark, and unused blocks are released only after several frames of lower demand, avoiding the alloc/free churn that would otherwise dominate the CPU cost of transient resources. This amortizes allocation cost to near zero in steady state.

Async compute scheduling
#

Async compute lets the GPU overlap independent work on separate hardware queues: compute shaders running alongside rasterization. The compiler must identify which passes can safely run async, insert cross-queue fences, and manage resource ownership transfers.

EngineApproachDiscovery
UE5Opt-in via ERDGPassFlags::AsyncCompute per passManual: compiler trusts the flag, handles fence insertion + cross-queue sync
FrostbiteDescribed as automatic in GDC talkReachability analysis in the compiler

Hardware reality: NVIDIA uses separate async engines. AMD exposes more independent CUs. Some GPUs just time-slice, so always profile to confirm real overlap. Vulkan requires explicit queue family ownership transfer. D3D12 uses ID3D12Fence. Both are expensive. Only use them if the overlap wins exceed the transfer cost.

Barrier batching & split barriers
#

Our MVP inserts one barrier at a time. Production engines batch multiple transitions into a single API call and split barriers across pass gaps for better GPU pipelining.

UE5 batches multiple resource transitions into a single API call rather than issuing one barrier per resource. Split barriers place the “begin” transition as early as possible and the “end” just before the resource is needed, giving the GPU time to pipeline the transition.

Diminishing returns on desktop. Modern drivers hide barrier latency internally. Biggest wins on expensive layout transitions (depth to shader-read) and console GPUs with more exposed pipeline control. Add last, and only if profiling shows barrier stalls.


Execute: Recording & Submission
#

After the compiler finishes, every decision has been made: pass order, memory layout, barrier placement, physical resource bindings. The execute phase just walks the plan and records GPU commands. No allocation happens here. That’s all done during compile, which makes execute safe to parallelize and the compiled plan cacheable across frames. Here’s where production engines scale beyond our MVP.

Parallel command recording
#

Our MVP records on a single thread. Production engines split the sorted pass list into groups and record each group on a separate thread using secondary command buffers (Vulkan) or command lists (D3D12), then merge at submit.

UE5 creates parallel FRHICommandList instances (one per pass group) and joins them before queue submission. This is where the bulk of CPU frame time goes in a graph-based renderer, so parallelizing it matters.

Bindless & the frame graph
#

Traditional bound descriptors require the executor to know each pass’s binding layout (which slots expect textures, which expect UAVs) and to set up descriptor sets or root signatures before every dispatch. Bindless flips that: one global descriptor heap holds every resource, and shaders index into it directly (ResourceDescriptorHeap[idx]). This changes the execute loop significantly.

ConcernBound descriptorsBindless
Execute loopExecutor builds a per-pass descriptor set matching the pass's root signature layoutOne root signature for all passes: executor just passes integer indices
Descriptor lifetimeManaged by API: bind, draw, releaseFrame graph manages heap slots: allocate on resource creation, free when the resource is culled or the frame ends
Aliased resourcesDescriptors implicitly invalidated on rebindTwo aliased resources share memory but have different descriptor indices, and the graph must invalidate or recycle the old slot to prevent stale access
ValidationAPI validates state at bind timeNo API-level safety: the graph's read()/write() declarations become the only correctness check

Bindless doesn’t change the DAG or the compile phase: sorting, culling, aliasing, and barriers work identically. What it simplifies is the execution side: the executor becomes a thin loop that sets a few root constants and dispatches, because every resource is already visible in the global heap. The cost is that you lose API-level validation. A missed read() declaration won’t trigger a binding error, it’ll silently access stale data.

The RDG–legacy boundary (UE5)
#

The biggest practical consideration with RDG is the seam between RDG-managed passes and legacy FRHICommandList code. At this boundary:

  • Barriers must be inserted manually (RDG can’t see what the legacy code does)
  • Resources must be extracted from RDG before legacy code can use them
  • Re-importing back into RDG requires RegisterExternalTexture() with correct state tracking

This boundary is shrinking every release as Epic migrates more passes to RDG, but in practice you’ll still hit it when integrating third-party plugins or older rendering features.

Debug & visualization
#

🔍
RDG Insights. Enable via the Unreal editor to visualize the full pass graph, resource lifetimes, and barrier placement. Use r.RDG.Debug CVars for validation: r.RDG.Debug.FlushGPU serializes execution for debugging, r.RDG.Debug.ExtendResourceLifetimes disables aliasing to isolate corruption bugs. The frame is data. Export it, diff it, analyze offline.

Navigating UE5 RDG#

1
FRDGBuilder: the graph object. AddPass(), CreateTexture(), Execute() are all here. Start by searching for this class in the RenderCore module.
2
FRDGPass: stores the parameter struct, execute lambda, and pass flags (ERDGPassFlags). The BEGIN_SHADER_PARAMETER_STRUCT macro-generated metadata lives on the parameter struct.
3
FRDGTexture / FRDGBuffer: the virtual resource handles, plus their SRV/UAV views. Tracks current state for barrier emission. These become physical RHI resources during execution.
4
The compile phase (topological sort, pass culling via r.RDG.CullPasses, barrier batching, async compute fence insertion) is internal to the builder. Enable r.RDG.Debug CVars to inspect it at runtime.

UE5 RDG: current state & roadmap
#

RDG: Current Engineering Trade-offs
Ongoing migration: Legacy FRHICommandList ↔ RDG boundary requires manual barriers. Epic is actively moving more passes into the graph each release
Macro-based parameter declaration: BEGIN_SHADER_PARAMETER_STRUCT trades debuggability and dynamic composition for compile-time safety and code generation
Transient-only aliasing: Imported resources are not aliased, even when lifetime is fully known within the frame, a deliberate simplification that may evolve
Async compute is opt-in: Manual ERDGPassFlags::AsyncCompute tagging. The compiler handles fence insertion but doesn't discover async opportunities automatically

Closing
#

A render graph is not always the right answer. If your project has a fixed pipeline with 3–4 passes that will never change, the overhead of a graph compiler is wasted complexity. But the moment your renderer needs to grow (new passes, new platforms, new debug tools), the graph pays for itself in the first week.

Across these four articles, we covered the full arc: Part I laid out the core theory: the declare/compile/execute lifecycle, sorting, culling, barriers, and aliasing. Part II turned that into working C++. Part III pushed further with async compute and split barriers. And this article mapped those ideas onto what ships in UE5 and Frostbite, showing how production engines implement the same concepts at scale.

You can now open FRDGBuilder in UE5 and read it, not reverse-engineer it. You know what AddPass builds, how the transient allocator aliases memory, why ERDGPassFlags::AsyncCompute exists, and how the RDG boundary with legacy code works in practice.

The point isn’t that every project needs a render graph. The point is that if you understand how they work, you’ll make a better decision about whether yours does.


Resources
#


Frame Graph - This article is part of a series.
Part : This Article