Skip to main content

Frame Graph — Production Engines

·1858 words·9 mins
Rendering Architecture - This article is part of a series.
Part : This Article
Frame Graph Part I 0%
Navigation
0% read
📖 Part III of III.TheoryBuild ItProduction Engines

Part II left us with a working frame graph — automatic barriers, pass culling, and memory aliasing in ~300 lines of C++. That’s a solid MVP, but production engines face problems we didn’t: parallel command recording, subpass merging, async compute scheduling, and managing thousands of passes across legacy codebases. This article examines how UE5 and Frostbite solved those problems, then maps out the path from MVP to production.


① Declare — Pass & Resource Registration
#

Every engine starts the same way: passes declare what they read and write, resources are requested by description, and the graph accumulates edges. The differences are in how that declaration happens.

🎮 UE5 RDG
#

Each AddPass takes a parameter struct + execute lambda. The struct is the setup phase — macros generate metadata, RDG extracts dependency edges:

BEGIN_SHADER_PARAMETER_STRUCT(...)
  SHADER_PARAMETER_RDG_TEXTURE(Input)
  RENDER_TARGET_BINDING_SLOT(Output)
END_SHADER_PARAMETER_STRUCT()
read edge
write edge ←  → DAG
Macro generates metadata → RDG extracts dependency edges. No separate setup lambda needed.

Pass flags control queue and behavior — ERDGPassFlags::Raster, ::Compute, ::AsyncCompute, ::NeverCull, ::Copy. Resources are either transient (CreateTexture — graph-owned, eligible for aliasing) or imported (RegisterExternalTexture — externally owned, barriers tracked but no aliasing).

Pass Flags
ERDGPassFlags::Raster — Graphics queue, render targets
ERDGPassFlags::Compute — Graphics queue, compute dispatch
ERDGPassFlags::AsyncCompute — Async compute queue
ERDGPassFlags::NeverCull — Exempt from dead-pass culling
ERDGPassFlags::Copy — Copy queue operations
ERDGPassFlags::SkipRenderPass — Raster pass that manages its own render pass
Resource Types
FRDGTexture / FRDGTextureRef — Render targets, SRVs, UAVs
FRDGBuffer / FRDGBufferRef — Structured, vertex/index, indirect args
FRDGUniformBuffer — Uniform/constant buffer references
Created via CreateTexture() (transient) or RegisterExternalTexture() (imported)

❄️ Frostbite
#

Frostbite’s GDC 2017 talk described a similar lambda-based declaration — setup lambda declares reads/writes, execute lambda records GPU commands. The exact current implementation isn’t public.

🔀 What’s different from our MVP
#

Declaration aspectOur MVPProduction engines
Edge declarationExplicit read() / write() calls in setup lambdaUE5: macro-generated metadata. Frostbite: lambda-based, similar to MVP.
Resource creationAll transient, created by descriptionTransient + imported distinction. Imported resources track barriers but aren't aliased in UE5.
Queue assignmentSingle queuePer-pass flags: graphics, compute, async compute, copy
RebuildFull rebuild every frameUE5: hybrid (cached topology, invalidated on change). Others: dynamic rebuild.

② Compile — The Graph Compiler at Scale
#

This is where production engines diverge most from our MVP. The compile phase runs entirely on the CPU, between declaration and execution. Our MVP does five things here: topo-sort, cull, scan lifetimes, alias, and compute barriers. Production engines do the same five — plus pass merging, async compute scheduling, split barrier placement, and barrier batching.

MVP compile
├ topo-sort
├ cull dead passes
├ scan lifetimes
├ alias memory
└ compute barriers
Production compile
├ topo-sort
├ cull dead passes
├ scan lifetimes
├ alias memory + cross-frame pooling
merge passes (subpass optimization)
schedule async compute
├ compute barriers + split begin/end
batch barriers

Every step below is a compile-time operation — no GPU work, no command recording. The compiler sees the full DAG and makes optimal decisions the pass author never has to think about.

✂️ Pass culling
#

Same algorithm as our MVP — backward reachability from the output — but at larger scale. UE5 uses refcount-based culling and skips allocation entirely for culled passes (saves transient allocator work). Culled passes never execute, never allocate resources, never emit barriers — they vanish as if they were never declared.

💾 Memory aliasing
#

Both engines use the same core algorithm from Part II — lifetime scanning + free-list allocation. The production refinements:

RefinementUE5 RDGFrostbite (GDC talk)
Placed resourcesFRDGTransientResourceAllocator binds into ID3D12Heap offsetsHeap sub-allocation
Size bucketingPower-of-two in transient allocatorCustom bin sizes
Cross-frame poolingPersistent pool, peak-N-frames sizingPooling described in talk
Imported aliasing transient onlyDescribed as supported

Our MVP allocates fresh each frame. Production engines pool across frames — once a heap is allocated, it persists and gets reused. UE5’s FRDGTransientResourceAllocator tracks peak usage over several frames and only grows the pool when needed. This amortizes allocation cost to near zero in steady state.

🔗 Pass merging
#

Pass merging is a compile-time optimization: the compiler identifies adjacent passes that share render targets and fuses them into a single render pass. On consoles with fixed-function hardware and on PC with D3D12 Render Pass Tier 2, this lets the GPU keep data on-chip between fused subpasses, avoiding expensive DRAM round-trips.

How each engine handles it:

  • UE5 RDG delegates to the RHI layer. The graph compiler doesn’t merge passes itself — pass authors never see subpasses, and the graph has no subpass concept.
  • Frostbite’s GDC talk described automatic merging in the graph compiler as a first-class feature.

⚡ Async compute scheduling
#

Async compute lets the GPU overlap independent work on separate hardware queues — compute shaders running alongside rasterization. The compiler must identify which passes can safely run async, insert cross-queue fences, and manage resource ownership transfers.

EngineApproachDiscovery
UE5Opt-in via ERDGPassFlags::AsyncCompute per passManual — compiler trusts the flag, handles fence insertion + cross-queue sync
FrostbiteDescribed as automatic in GDC talkReachability analysis in the compiler

Hardware reality: NVIDIA uses separate async engines. AMD exposes more independent CUs. Some GPUs just time-slice — always profile to confirm real overlap. Vulkan requires explicit queue family ownership transfer; D3D12 uses ID3D12Fence. Both are expensive — only worth it if overlap wins exceed transfer cost.

🚧 Barrier batching & split barriers
#

Our MVP inserts one barrier at a time. Production engines batch multiple transitions into a single API call and split barriers across pass gaps for better GPU pipelining.

UE5 batches transitions via FRDGBarrierBatchBegin/FRDGBarrierBatchEnd — multiple resource transitions coalesced into one API call. Split barriers place the “begin” transition as early as possible and the “end” just before the resource is needed, giving the GPU time to pipeline the transition.

Diminishing returns on desktop — modern drivers hide barrier latency internally. Biggest wins on expensive layout transitions (depth → shader-read) and console GPUs with more exposed pipeline control. Add last, and only if profiling shows barrier stalls.


③ Execute — Recording & Submission
#

After the compiler finishes, every decision has been made — pass order, memory layout, barrier placement, physical resource bindings. The execute phase just walks the plan and records GPU commands. No allocation happens here — that’s all done during compile, which makes execute safe to parallelize and the compiled plan cacheable across frames. Here’s where production engines scale beyond our MVP.

🧵 Parallel command recording
#

Our MVP records on a single thread. Production engines split the sorted pass list into groups and record each group on a separate thread using secondary command buffers (Vulkan) or command lists (D3D12), then merge at submit.

UE5 creates parallel FRHICommandList instances — one per pass group — and joins them before queue submission. This is where the bulk of CPU frame time goes in a graph-based renderer, so parallelizing it matters.

🔗 The RDG–legacy boundary (UE5)
#

The biggest practical consideration with RDG is the seam between RDG-managed passes and legacy FRHICommandList code. At this boundary:

  • Barriers must be inserted manually (RDG can’t see what the legacy code does)
  • Resources must be “extracted” from RDG via ConvertToExternalTexture() before legacy code can use them
  • Re-importing back into RDG requires RegisterExternalTexture() with correct state tracking

This boundary is shrinking every release as Epic migrates more passes to RDG, but in practice you’ll still hit it when integrating third-party plugins or older rendering features.

🔍 Debug & visualization
#

🔍
RDG Insights. Enable via the Unreal editor to visualize the full pass graph, resource lifetimes, and barrier placement. Use r.RDG.Debug CVars for validation: r.RDG.Debug.FlushGPU serializes execution for debugging, r.RDG.Debug.ExtendResourceLifetimes disables aliasing to isolate corruption bugs. The frame is data — export it, diff it, analyze offline.

🗺️ Navigating the UE5 RDG source
#

1
RenderGraphBuilder.hFRDGBuilder is the graph object. AddPass(), CreateTexture(), Execute() are all here. Start reading here.
2
RenderGraphPass.hFRDGPass stores the parameter struct, execute lambda, and pass flags. The macro-generated metadata lives on the parameter struct.
3
RenderGraphResources.hFRDGTexture, FRDGBuffer, and their SRV/UAV views. Tracks current state for barrier emission. Check FRDGResource::GetRHI() to see when virtual becomes physical.
4
RenderGraphPrivate.h — The compile phase: topological sort, pass culling, barrier batching, async compute fence insertion. The core algorithms live here.

📝 UE5 RDG — current state & roadmap
#

RDG — Current Engineering Trade-offs
Ongoing migration — Legacy FRHICommandList ←→ RDG boundary requires manual barriers; Epic is actively moving more passes into the graph each release
Macro-based parameter declaration — BEGIN_SHADER_PARAMETER_STRUCT trades debuggability and dynamic composition for compile-time safety and code generation
Transient-only aliasing — Imported resources are not aliased, even when lifetime is fully known within the frame — a deliberate simplification that may evolve
No automatic subpass merging — Delegated to the RHI layer; the graph compiler doesn't optimize render pass structure directly
Async compute is opt-in — Manual ERDGPassFlags::AsyncCompute tagging. The compiler handles fence insertion but doesn't discover async opportunities automatically

🏁 Closing
#

A render graph is not always the right answer. If your project has a fixed pipeline with 3–4 passes that will never change, the overhead of a graph compiler is wasted complexity. But the moment your renderer needs to grow — new passes, new platforms, new debug tools — the graph pays for itself in the first week.

Across these three articles, we covered the full arc: Part I laid out all the theory — the declare/compile/execute lifecycle, pass merging, async compute, and split barriers. Part II turned the core into working C++ — automatic barriers, pass culling, and memory aliasing. And this article mapped those ideas onto what ships in UE5 and Frostbite, showing how production engines implement the same concepts at scale.

You can now open RenderGraphBuilder.h in UE5 and read it, not reverse-engineer it. You know what FRDGBuilder::AddPass builds, how the transient allocator aliases memory, why ERDGPassFlags::AsyncCompute exists, and how the RDG boundary with legacy code works in practice.

The point isn’t that every project needs a render graph. The point is that if you understand how they work, you’ll make a better decision about whether yours does.


📚 Resources
#


Rendering Architecture - This article is part of a series.
Part : This Article