Frame Graph — Production Engines

Rendering Architecture - This article is part of a series.

Part : This Article

Navigation

0% read

📖 Part III of III. Theory → Build It → Production Engines

Part II left us with a working frame graph — automatic barriers, pass culling, and memory aliasing in ~300 lines of C++. That’s a solid MVP, but production engines face problems we didn’t: parallel command recording, subpass merging, async compute scheduling, and managing thousands of passes across legacy codebases. This article examines how UE5 and Frostbite solved those problems, then maps out the path from MVP to production.

① Declare — Pass & Resource Registration
#

Every engine starts the same way: passes declare what they read and write, resources are requested by description, and the graph accumulates edges. The differences are in how that declaration happens.

🎮 UE5 RDG
#

Each AddPass takes a parameter struct + execute lambda. The struct is the setup phase — macros generate metadata, RDG extracts dependency edges:

BEGIN_SHADER_PARAMETER_STRUCT(...)
SHADER_PARAMETER_RDG_TEXTURE(Input)
RENDER_TARGET_BINDING_SLOT(Output)
END_SHADER_PARAMETER_STRUCT()

→

read edge ←
write edge ← → DAG

Macro generates metadata → RDG extracts dependency edges. No separate setup lambda needed.

Pass flags control queue and behavior — ERDGPassFlags::Raster, ::Compute, ::AsyncCompute, ::NeverCull, ::Copy. Resources are either transient (CreateTexture — graph-owned, eligible for aliasing) or imported (RegisterExternalTexture — externally owned, barriers tracked but no aliasing).

Pass Flags

ERDGPassFlags::Raster — Graphics queue, render targets
ERDGPassFlags::Compute — Graphics queue, compute dispatch
ERDGPassFlags::AsyncCompute — Async compute queue
ERDGPassFlags::NeverCull — Exempt from dead-pass culling
ERDGPassFlags::Copy — Copy queue operations
ERDGPassFlags::SkipRenderPass — Raster pass that manages its own render pass

Resource Types

FRDGTexture / FRDGTextureRef — Render targets, SRVs, UAVs
FRDGBuffer / FRDGBufferRef — Structured, vertex/index, indirect args
FRDGUniformBuffer — Uniform/constant buffer references
Created via CreateTexture() (transient) or RegisterExternalTexture() (imported)

❄️ Frostbite
#

Frostbite’s GDC 2017 talk described a similar lambda-based declaration — setup lambda declares reads/writes, execute lambda records GPU commands. The exact current implementation isn’t public.

🔀 What’s different from our MVP
#

Declaration aspect	Our MVP	Production engines
Edge declaration	Explicit `read()` / `write()` calls in setup lambda	UE5: macro-generated metadata. Frostbite: lambda-based, similar to MVP.
Resource creation	All transient, created by description	Transient + imported distinction. Imported resources track barriers but aren't aliased in UE5.
Queue assignment	Single queue	Per-pass flags: graphics, compute, async compute, copy
Rebuild	Full rebuild every frame	UE5: hybrid (cached topology, invalidated on change). Others: dynamic rebuild.

② Compile — The Graph Compiler at Scale
#

This is where production engines diverge most from our MVP. The compile phase runs entirely on the CPU, between declaration and execution. Our MVP does five things here: topo-sort, cull, scan lifetimes, alias, and compute barriers. Production engines do the same five — plus pass merging, async compute scheduling, split barrier placement, and barrier batching.

MVP compile

├ topo-sort
├ cull dead passes
├ scan lifetimes
├ alias memory
└ compute barriers

→

Production compile

├ topo-sort
├ cull dead passes
├ scan lifetimes
├ alias memory + cross-frame pooling
├ merge passes (subpass optimization)
├ schedule async compute
├ compute barriers + split begin/end
└ batch barriers

Every step below is a compile-time operation — no GPU work, no command recording. The compiler sees the full DAG and makes optimal decisions the pass author never has to think about.

✂️ Pass culling
#

Same algorithm as our MVP — backward reachability from the output — but at larger scale. UE5 uses refcount-based culling and skips allocation entirely for culled passes (saves transient allocator work). Culled passes never execute, never allocate resources, never emit barriers — they vanish as if they were never declared.

💾 Memory aliasing
#

Both engines use the same core algorithm from Part II — lifetime scanning + free-list allocation. The production refinements:

Refinement	UE5 RDG	Frostbite (GDC talk)
Placed resources	`FRDGTransientResourceAllocator` binds into `ID3D12Heap` offsets	Heap sub-allocation
Size bucketing	Power-of-two in transient allocator	Custom bin sizes
Cross-frame pooling	Persistent pool, peak-N-frames sizing	Pooling described in talk
Imported aliasing	✗ transient only	Described as supported

Our MVP allocates fresh each frame. Production engines pool across frames — once a heap is allocated, it persists and gets reused. UE5’s FRDGTransientResourceAllocator tracks peak usage over several frames and only grows the pool when needed. This amortizes allocation cost to near zero in steady state.

🔗 Pass merging
#

Pass merging is a compile-time optimization: the compiler identifies adjacent passes that share render targets and fuses them into a single render pass. On consoles with fixed-function hardware and on PC with D3D12 Render Pass Tier 2, this lets the GPU keep data on-chip between fused subpasses, avoiding expensive DRAM round-trips.

How each engine handles it:

UE5 RDG delegates to the RHI layer. The graph compiler doesn’t merge passes itself — pass authors never see subpasses, and the graph has no subpass concept.
Frostbite’s GDC talk described automatic merging in the graph compiler as a first-class feature.

⚡ Async compute scheduling
#

Async compute lets the GPU overlap independent work on separate hardware queues — compute shaders running alongside rasterization. The compiler must identify which passes can safely run async, insert cross-queue fences, and manage resource ownership transfers.

Engine	Approach	Discovery
UE5	Opt-in via `ERDGPassFlags::AsyncCompute` per pass	Manual — compiler trusts the flag, handles fence insertion + cross-queue sync
Frostbite	Described as automatic in GDC talk	Reachability analysis in the compiler

Hardware reality: NVIDIA uses separate async engines. AMD exposes more independent CUs. Some GPUs just time-slice — always profile to confirm real overlap. Vulkan requires explicit queue family ownership transfer; D3D12 uses ID3D12Fence. Both are expensive — only worth it if overlap wins exceed transfer cost.

🚧 Barrier batching & split barriers
#

Our MVP inserts one barrier at a time. Production engines batch multiple transitions into a single API call and split barriers across pass gaps for better GPU pipelining.

UE5 batches transitions via FRDGBarrierBatchBegin/FRDGBarrierBatchEnd — multiple resource transitions coalesced into one API call. Split barriers place the “begin” transition as early as possible and the “end” just before the resource is needed, giving the GPU time to pipeline the transition.

Diminishing returns on desktop — modern drivers hide barrier latency internally. Biggest wins on expensive layout transitions (depth → shader-read) and console GPUs with more exposed pipeline control. Add last, and only if profiling shows barrier stalls.

③ Execute — Recording & Submission
#

After the compiler finishes, every decision has been made — pass order, memory layout, barrier placement, physical resource bindings. The execute phase just walks the plan and records GPU commands. No allocation happens here — that’s all done during compile, which makes execute safe to parallelize and the compiled plan cacheable across frames. Here’s where production engines scale beyond our MVP.

🧵 Parallel command recording
#

Our MVP records on a single thread. Production engines split the sorted pass list into groups and record each group on a separate thread using secondary command buffers (Vulkan) or command lists (D3D12), then merge at submit.

UE5 creates parallel FRHICommandList instances — one per pass group — and joins them before queue submission. This is where the bulk of CPU frame time goes in a graph-based renderer, so parallelizing it matters.

🔗 The RDG–legacy boundary (UE5)
#

The biggest practical consideration with RDG is the seam between RDG-managed passes and legacy FRHICommandList code. At this boundary:

Barriers must be inserted manually (RDG can’t see what the legacy code does)
Resources must be “extracted” from RDG via ConvertToExternalTexture() before legacy code can use them
Re-importing back into RDG requires RegisterExternalTexture() with correct state tracking

This boundary is shrinking every release as Epic migrates more passes to RDG, but in practice you’ll still hit it when integrating third-party plugins or older rendering features.

🔍 Debug & visualization
#

🔍

RDG Insights. Enable via the Unreal editor to visualize the full pass graph, resource lifetimes, and barrier placement. Use r.RDG.Debug CVars for validation: r.RDG.Debug.FlushGPU serializes execution for debugging, r.RDG.Debug.ExtendResourceLifetimes disables aliasing to isolate corruption bugs. The frame is data — export it, diff it, analyze offline.

🗺️ Navigating the UE5 RDG source
#

RenderGraphBuilder.h — FRDGBuilder is the graph object. AddPass(), CreateTexture(), Execute() are all here. Start reading here.

RenderGraphPass.h — FRDGPass stores the parameter struct, execute lambda, and pass flags. The macro-generated metadata lives on the parameter struct.

RenderGraphResources.h — FRDGTexture, FRDGBuffer, and their SRV/UAV views. Tracks current state for barrier emission. Check FRDGResource::GetRHI() to see when virtual becomes physical.

RenderGraphPrivate.h — The compile phase: topological sort, pass culling, barrier batching, async compute fence insertion. The core algorithms live here.

📝 UE5 RDG — current state & roadmap
#

RDG — Current Engineering Trade-offs

▸ Ongoing migration — Legacy FRHICommandList ←→ RDG boundary requires manual barriers; Epic is actively moving more passes into the graph each release

▸ Macro-based parameter declaration — BEGIN_SHADER_PARAMETER_STRUCT trades debuggability and dynamic composition for compile-time safety and code generation

▸ Transient-only aliasing — Imported resources are not aliased, even when lifetime is fully known within the frame — a deliberate simplification that may evolve

▸ No automatic subpass merging — Delegated to the RHI layer; the graph compiler doesn't optimize render pass structure directly

▸ Async compute is opt-in — Manual ERDGPassFlags::AsyncCompute tagging. The compiler handles fence insertion but doesn't discover async opportunities automatically

🏁 Closing
#

A render graph is not always the right answer. If your project has a fixed pipeline with 3–4 passes that will never change, the overhead of a graph compiler is wasted complexity. But the moment your renderer needs to grow — new passes, new platforms, new debug tools — the graph pays for itself in the first week.

Across these three articles, we covered the full arc: Part I laid out all the theory — the declare/compile/execute lifecycle, pass merging, async compute, and split barriers. Part II turned the core into working C++ — automatic barriers, pass culling, and memory aliasing. And this article mapped those ideas onto what ships in UE5 and Frostbite, showing how production engines implement the same concepts at scale.

You can now open RenderGraphBuilder.h in UE5 and read it, not reverse-engineer it. You know what FRDGBuilder::AddPass builds, how the transient allocator aliases memory, why ERDGPassFlags::AsyncCompute exists, and how the RDG boundary with legacy code works in practice.

The point isn’t that every project needs a render graph. The point is that if you understand how they work, you’ll make a better decision about whether yours does.

📚 Resources
#

Rendergraphs & High Level Rendering — Wijiler (YouTube) — 15-minute visual intro to render graphs and modern graphics APIs.
Render Graphs — GPUOpen — AMD’s overview covering declare/compile/execute, barriers, and aliasing.
FrameGraph: Extensible Rendering Architecture in Frostbite (GDC 2017) — The original talk that introduced the modern frame graph concept.
Render Graphs — Riccardo Loggini — Practical walkthrough with D3D12 placed resources and transient aliasing.
Render graphs and Vulkan — themaister — Full Vulkan implementation covering subpass merging, barriers, and async compute.
Render Dependency Graph — Unreal Engine — Epic’s official RDG documentation.
Understanding Vulkan Synchronization — Khronos Blog — Pipeline barriers, events, semaphores, fences, and timeline semaphores.
Using Resource Barriers — Microsoft Learn — D3D12 transition, aliasing, UAV, and split barriers reference.
RenderPipelineShaders — GitHub (AMD) — Open-source render graph framework with automatic barriers and transient aliasing.

← Previous: Part II — Build It

Rendering Architecture - This article is part of a series.

Part : This Article

Part : Frame Graph — Build It

Part : Frame Graph — Theory

① Declare — Pass & Resource Registration#

🎮 UE5 RDG#

❄️ Frostbite#

🔀 What’s different from our MVP#

② Compile — The Graph Compiler at Scale#

✂️ Pass culling#

💾 Memory aliasing#

🔗 Pass merging#

⚡ Async compute scheduling#

🚧 Barrier batching & split barriers#

③ Execute — Recording & Submission#

🧵 Parallel command recording#

🔗 The RDG–legacy boundary (UE5)#

🔍 Debug & visualization#

🗺️ Navigating the UE5 RDG source#

📝 UE5 RDG — current state & roadmap#

🏁 Closing#

📚 Resources#