Three iterations from blank file to working frame graph with automatic barriers and memory aliasing. Each version builds on the last — by the end you’ll have something you can drop into a real renderer.
Part I covered what a frame graph is — the three-phase lifecycle (declare → compile → execute), the DAG, and why every major engine uses one. Now we implement it. Three C++ iterations, each adding a layer: v1 is scaffolding, v2 adds the dependency graph with automatic barriers and pass culling, v3 adds lifetime analysis and memory aliasing.
🏗️ API Design#
We start from the API you want to write — a minimal FrameGraph that declares a depth prepass, GBuffer pass, and lighting pass in ~20 lines of C++.
🎯 Design principles#
Execute — runs later. Records GPU commands into a fully resolved environment.
{1920, 1080, RGBA8}), not GPU handle. Virtual until the compiler maps them to memory.These three ideas produce a natural pipeline — declare your intent, let the compiler optimize, then execute:
addPass(setup, execute)├ setup lambda runs
• declare reads / writes
• request resources
└ no GPU work, no allocation
├ cull — remove dead passes
├ alias — map virtual → physical
└ barrier — emit transitions
├ insert automatic barriers
└ call execute lambda
→ draw / dispatch / copy
🧩 Putting it together#
Here’s how the final API reads — three passes, ~20 lines:
FrameGraph fg;
auto depth = fg.createResource({1920, 1080, Format::D32F});
auto gbufA = fg.createResource({1920, 1080, Format::RGBA8});
auto gbufN = fg.createResource({1920, 1080, Format::RGBA8});
auto hdr = fg.createResource({1920, 1080, Format::RGBA16F});
fg.addPass("DepthPrepass",
[&]() { fg.write(0, depth); },
[&](/*cmd*/) { /* draw scene depth-only */ });
fg.addPass("GBuffer",
[&]() { fg.read(1, depth); fg.write(1, gbufA); fg.write(1, gbufN); },
[&](/*cmd*/) { /* draw scene to GBuffer MRTs */ });
fg.addPass("Lighting",
[&]() { fg.read(2, gbufA); fg.read(2, gbufN); fg.write(2, hdr); },
[&](/*cmd*/) { /* fullscreen lighting pass */ });
fg.execute(); // → topo-sort, cull, alias, barrier, run
Three passes, declared as lambdas. The graph handles the rest — ordering, barriers, memory. We build this step by step below.
🧱 MVP v1 — Declare & Execute#
Data structures:
• name
• setup() ← build the DAG
• execute() ← record GPU cmds
• width, height, format
• virtual — no GPU handle yet
(UE5: FRDGTextureRef / FRDGBufferRef)
Flow: Declare passes in order → execute in order. No dependency tracking yet. Resources are created eagerly.
#pragma once
// Frame Graph MVP v1 — Declare & Execute
// No dependency tracking, no barriers, no aliasing.
// Passes execute in declaration order.
//
// Compile: any C++17 compiler (header-only, no GPU backend needed)
// g++ -std=c++17 -c frame_graph_v1.h
// or just #include it from your example/test file.
#include <cstdint>
#include <functional>
#include <string>
#include <vector>
// ── Resource description (virtual until compile) ──────────────
enum class Format { RGBA8, RGBA16F, R8, D32F };
struct ResourceDesc {
uint32_t width = 0;
uint32_t height = 0;
Format format = Format::RGBA8;
};
// Handle = typed index into the graph's resource array.
// No GPU memory behind it yet — just a number.
struct ResourceHandle {
uint32_t index = UINT32_MAX;
bool isValid() const { return index != UINT32_MAX; }
};
// ── Render pass ───────────────────────────────────────────────
struct RenderPass {
std::string name;
std::function<void()> setup; // build the DAG (v1: unused)
std::function<void(/*cmd list*/)> execute; // record GPU commands
};
// ── Frame graph ───────────────────────────────────────────────
class FrameGraph {
public:
// Create a virtual resource — returns a handle, not GPU memory.
ResourceHandle createResource(const ResourceDesc& desc) {
resources_.push_back(desc);
return { static_cast<uint32_t>(resources_.size() - 1) };
}
// Register a pass. Setup runs now; execute is stored for later.
template <typename SetupFn, typename ExecFn>
void addPass(const std::string& name, SetupFn&& setup, ExecFn&& exec) {
passes_.push_back({ name, std::forward<SetupFn>(setup),
std::forward<ExecFn>(exec) });
passes_.back().setup(); // run setup immediately
}
// Compile + execute. v1 is trivial — just run in declaration order.
void execute() {
// In v1 we skip compile entirely — no sorting, no culling,
// no barriers. Just run every pass in the order it was added.
for (auto& pass : passes_) {
// Here you'd bind resources, begin render pass, etc.
pass.execute(/* &cmdList */);
}
// Frame over — clear everything for next frame.
passes_.clear();
resources_.clear();
}
private:
std::vector<RenderPass> passes_;
std::vector<ResourceDesc> resources_;
};
// Frame Graph MVP v1 -- Usage Example
// Compile: g++ -std=c++17 -o example_v1 example_v1.cpp
#include "frame_graph_v1.h"
#include <cstdio>
int main() {
printf("=== Frame Graph v1: Declare & Execute ===\n");
printf("No dependencies, no barriers, no aliasing.\n");
printf("Passes execute in declaration order.\n\n");
FrameGraph fg;
auto depth = fg.createResource({1920, 1080, Format::D32F});
auto gbufA = fg.createResource({1920, 1080, Format::RGBA8});
auto gbufN = fg.createResource({1920, 1080, Format::RGBA8});
auto hdr = fg.createResource({1920, 1080, Format::RGBA16F});
printf("Resources created:\n");
printf(" [0] depth 1920x1080 D32F\n");
printf(" [1] gbufA 1920x1080 RGBA8\n");
printf(" [2] gbufN 1920x1080 RGBA8\n");
printf(" [3] hdr 1920x1080 RGBA16F\n\n");
fg.addPass("DepthPrepass",
[&]() { printf(" setup: DepthPrepass registered\n"); },
[&](/*cmd*/) { printf(" [pass 0] DepthPrepass -> writes depth\n"); });
fg.addPass("GBuffer",
[&]() { printf(" setup: GBuffer registered\n"); },
[&](/*cmd*/) { printf(" [pass 1] GBuffer -> reads depth, writes gbufA, gbufN\n"); });
fg.addPass("Lighting",
[&]() { printf(" setup: Lighting registered\n"); },
[&](/*cmd*/) { printf(" [pass 2] Lighting -> reads gbufA, gbufN, writes hdr\n"); });
printf("\nPasses added (setup phase):\n");
printf("\nExecuting (declaration order -- no sorting):\n");
fg.execute();
printf("\n[OK] v1 complete. Passes ran in order, but:\n");
printf(" - No dependency tracking -- order is manual\n");
printf(" - No barrier insertion -- GPU hazards possible\n");
printf(" - No dead-pass culling\n");
printf(" - No memory aliasing\n");
printf(" -> v2 will fix the first three.\n");
return 0;
}
Compiles and runs — the execute lambdas are stubs, but the scaffolding is real. Every piece we add in v2 and v3 goes into this same FrameGraph class.
The lambda-based pass declaration pattern works. You can already compose passes without manual barrier calls (even though barriers are no-ops here).
Executes passes in declaration order, creates every resource upfront. Correct but wasteful. Version 2 adds the graph.
🔗 MVP v2 — Dependencies & Barriers#
Four pieces, each feeding the next:
🔀 Resource versioning & the dependency graph#
Multiple passes can read the same resource without conflict — but when a pass writes to it, every later reader needs to know which write they depend on. The solution: each write bumps the resource’s version number. Readers attach to the version that existed when they were declared, so dependency edges stay precise even when the same resource is written multiple times per frame.
📊 Topological sort (Kahn’s algorithm)#
Count incoming edges per pass. Any pass with zero incoming edges has all dependencies satisfied — emit it, decrement its neighbors’ counts, repeat until the queue is empty. If the output is shorter than the pass count, the graph has a cycle.
✂️ Pass culling#
#ifdef, no flag.
⏱️
Cost: O(V + E) — one linear walk over the graph.Toggle edges in the DAG to see it live — disconnect a pass and the compiler removes it along with its resources. No #ifdef, no feature flag — just a missing edge.
🚧 Barrier insertion#
A GPU resource can’t be a render target and a shader input at the same time — the hardware needs to flush caches, change memory layout, and switch access modes between those uses. That transition is a barrier.
The graph already knows the sorted pass order and what each pass reads or writes. So for every resource handoff — GBuffer goes from “being written by pass A” to “being read by pass B” — it inserts the correct barrier automatically. Here’s every type of barrier a real frame needs:
🧩 Putting it together — v1 → v2 diff#
We need five new pieces: (1) resource versioning with read/write tracking, (2) adjacency list for the DAG, (3) topological sort, (4) pass culling, and (5) barrier insertion. Additions marked with // NEW v2 in the source:
Full updated source:
#pragma once
// Frame Graph MVP v2 — Dependencies & Barriers
// Adds: resource versioning, DAG with adjacency list, Kahn's topo-sort,
// pass culling, and automatic barrier insertion.
//
// Compile: any C++17 compiler (header-only, no GPU backend needed)
// g++ -std=c++17 -c frame_graph_v2.h
// or just #include it from your example/test file.
#include <algorithm>
#include <cassert>
#include <cstdint>
#include <cstdio>
#include <functional>
#include <queue>
#include <string>
#include <unordered_set>
#include <vector>
// ── Resource description (virtual until compile) ──────────────
enum class Format { RGBA8, RGBA16F, R8, D32F };
struct ResourceDesc {
uint32_t width = 0;
uint32_t height = 0;
Format format = Format::RGBA8;
};
struct ResourceHandle {
uint32_t index = UINT32_MAX;
bool isValid() const { return index != UINT32_MAX; }
};
// ── Resource state tracking (NEW v2) ──────────────────────────
enum class ResourceState { Undefined, ColorAttachment, DepthAttachment,
ShaderRead, Present };
inline const char* stateName(ResourceState s) {
switch (s) {
case ResourceState::Undefined: return "Undefined";
case ResourceState::ColorAttachment: return "ColorAttachment";
case ResourceState::DepthAttachment: return "DepthAttachment";
case ResourceState::ShaderRead: return "ShaderRead";
case ResourceState::Present: return "Present";
default: return "?";
}
}
struct ResourceVersion { // NEW v2
uint32_t writerPass = UINT32_MAX; // which pass wrote this version
std::vector<uint32_t> readerPasses; // which passes read it
};
// Extend ResourceDesc with tracking:
struct ResourceEntry {
ResourceDesc desc;
std::vector<ResourceVersion> versions; // version 0, 1, 2...
ResourceState currentState = ResourceState::Undefined;
};
// ── Updated render pass ───────────────────────────────────────
struct RenderPass {
std::string name;
std::function<void()> setup;
std::function<void(/*cmd list*/)> execute;
std::vector<ResourceHandle> reads; // NEW v2
std::vector<ResourceHandle> writes; // NEW v2
std::vector<uint32_t> dependsOn; // NEW v2 — passes this pass depends on
std::vector<uint32_t> successors; // NEW v2 — passes that depend on this pass
uint32_t inDegree = 0; // NEW v2 — for Kahn's
bool alive = false; // NEW v2 — for culling
};
// ── Updated FrameGraph ────────────────────────────────────────
class FrameGraph {
public:
ResourceHandle createResource(const ResourceDesc& desc) {
entries_.push_back({ desc, {{}}, ResourceState::Undefined });
return { static_cast<uint32_t>(entries_.size() - 1) };
}
// Declare a read — links this pass to the resource's current version.
void read(uint32_t passIdx, ResourceHandle h) { // NEW v2
auto& ver = entries_[h.index].versions.back();
if (ver.writerPass != UINT32_MAX) {
passes_[passIdx].dependsOn.push_back(ver.writerPass);
}
ver.readerPasses.push_back(passIdx);
passes_[passIdx].reads.push_back(h);
}
// Declare a write — creates a new version of the resource.
void write(uint32_t passIdx, ResourceHandle h) { // NEW v2
entries_[h.index].versions.push_back({});
entries_[h.index].versions.back().writerPass = passIdx;
passes_[passIdx].writes.push_back(h);
}
template <typename SetupFn, typename ExecFn>
void addPass(const std::string& name, SetupFn&& setup, ExecFn&& exec) {
uint32_t idx = static_cast<uint32_t>(passes_.size());
passes_.push_back({ name, std::forward<SetupFn>(setup),
std::forward<ExecFn>(exec) });
currentPass_ = idx; // NEW v2 — so setup can call read()/write()
passes_.back().setup();
}
void execute() {
printf("\n[1] Building dependency edges...\n");
buildEdges(); // NEW v2
printf("[2] Topological sort...\n");
auto sorted = topoSort(); // NEW v2
printf("[3] Culling dead passes...\n");
cull(sorted); // NEW v2
printf("[4] Executing (with automatic barriers):\n");
for (uint32_t idx : sorted) {
if (!passes_[idx].alive) {
printf(" -- skip: %s (CULLED)\n", passes_[idx].name.c_str());
continue;
}
insertBarriers(idx); // NEW v2
passes_[idx].execute(/* &cmdList */);
}
passes_.clear();
entries_.clear();
}
private:
uint32_t currentPass_ = 0;
std::vector<RenderPass> passes_;
std::vector<ResourceEntry> entries_;
// ── Build dependency edges ────────────────────────────────
void buildEdges() { // NEW v2
for (uint32_t i = 0; i < passes_.size(); i++) {
// Deduplicate dependency edges and build successor list.
std::unordered_set<uint32_t> seen;
for (uint32_t dep : passes_[i].dependsOn) {
if (seen.insert(dep).second) {
passes_[dep].successors.push_back(i);
passes_[i].inDegree++;
}
}
}
}
// ── Kahn's topological sort — O(V + E) ────────────────────
std::vector<uint32_t> topoSort() { // NEW v2
std::queue<uint32_t> q;
std::vector<uint32_t> inDeg(passes_.size());
for (uint32_t i = 0; i < passes_.size(); i++) {
inDeg[i] = passes_[i].inDegree;
if (inDeg[i] == 0) q.push(i);
}
std::vector<uint32_t> order;
while (!q.empty()) {
uint32_t cur = q.front(); q.pop();
order.push_back(cur);
// Walk the adjacency list — O(E) total across all nodes.
for (uint32_t succ : passes_[cur].successors) {
if (--inDeg[succ] == 0)
q.push(succ);
}
}
assert(order.size() == passes_.size() && "Cycle detected!");
printf(" Topological order: ");
for (uint32_t i = 0; i < order.size(); i++) {
printf("%s%s", passes_[order[i]].name.c_str(),
i + 1 < order.size() ? " -> " : "\n");
}
return order;
}
// ── Cull dead passes (backward walk from output) ──────────
void cull(const std::vector<uint32_t>& sorted) { // NEW v2
// Mark the last pass (present) as alive, then walk backward.
if (sorted.empty()) return;
passes_[sorted.back()].alive = true;
for (int i = static_cast<int>(sorted.size()) - 1; i >= 0; i--) {
if (!passes_[sorted[i]].alive) continue;
for (uint32_t dep : passes_[sorted[i]].dependsOn)
passes_[dep].alive = true;
}
printf(" Culling result: ");
for (uint32_t i = 0; i < passes_.size(); i++) {
printf("%s=%s%s", passes_[i].name.c_str(),
passes_[i].alive ? "ALIVE" : "DEAD",
i + 1 < passes_.size() ? ", " : "\n");
}
}
// ── Insert barriers where resource state changes ──────────
void insertBarriers(uint32_t passIdx) { // NEW v2
auto stateForUsage = [](bool isWrite, Format fmt) {
if (isWrite)
return (fmt == Format::D32F) ? ResourceState::DepthAttachment
: ResourceState::ColorAttachment;
return ResourceState::ShaderRead;
};
for (auto& h : passes_[passIdx].reads) {
ResourceState needed = ResourceState::ShaderRead;
if (entries_[h.index].currentState != needed) {
printf(" barrier: resource[%u] %s -> %s\n",
h.index,
stateName(entries_[h.index].currentState),
stateName(needed));
entries_[h.index].currentState = needed;
}
}
for (auto& h : passes_[passIdx].writes) {
ResourceState needed = stateForUsage(true, entries_[h.index].desc.format);
if (entries_[h.index].currentState != needed) {
printf(" barrier: resource[%u] %s -> %s\n",
h.index,
stateName(entries_[h.index].currentState),
stateName(needed));
entries_[h.index].currentState = needed;
}
}
}
};
// Frame Graph MVP v2 -- Usage Example
// Compile: g++ -std=c++17 -o example_v2 example_v2.cpp
#include "frame_graph_v2.h"
#include <cstdio>
int main() {
printf("=== Frame Graph v2: Dependencies & Barriers ===\n");
printf("Adds: dependency DAG, topo-sort, pass culling, auto barriers.\n\n");
FrameGraph fg;
auto depth = fg.createResource({1920, 1080, Format::D32F});
auto gbufA = fg.createResource({1920, 1080, Format::RGBA8});
auto hdr = fg.createResource({1920, 1080, Format::RGBA16F});
auto debug = fg.createResource({1920, 1080, Format::RGBA8});
printf("Resources created:\n");
printf(" [0] depth 1920x1080 D32F\n");
printf(" [1] gbufA 1920x1080 RGBA8\n");
printf(" [2] hdr 1920x1080 RGBA16F\n");
printf(" [3] debug 1920x1080 RGBA8 (unused -- should be culled)\n\n");
fg.addPass("DepthPrepass",
[&]() { fg.write(0, depth); },
[&](/*cmd*/) { printf(" >> exec: DepthPrepass (writes depth)\n"); });
fg.addPass("GBuffer",
[&]() { fg.read(1, depth); fg.write(1, gbufA); },
[&](/*cmd*/) { printf(" >> exec: GBuffer (reads depth, writes gbufA)\n"); });
fg.addPass("Lighting",
[&]() { fg.read(2, gbufA); fg.write(2, hdr); },
[&](/*cmd*/) { printf(" >> exec: Lighting (reads gbufA, writes hdr)\n"); });
// This pass writes debug but nothing reads it -- the graph will cull it.
fg.addPass("DebugOverlay",
[&]() { fg.write(3, debug); },
[&](/*cmd*/) { printf(" >> exec: DebugOverlay (writes debug)\n"); });
printf("Passes added: DepthPrepass, GBuffer, Lighting, DebugOverlay\n");
printf("Dependencies: GBuffer->DepthPrepass, Lighting->GBuffer\n");
printf("DebugOverlay has no consumers -- it's a dead pass.\n\n");
printf("Compiling graph...\n");
fg.execute();
printf("\n[OK] v2 complete. Automatic dependency sorting, dead-pass culling,\n");
printf(" and barrier insertion -- no manual work needed.\n");
printf(" But each resource gets its own GPU allocation (no reuse).\n");
printf(" -> v3 will add lifetime analysis and memory aliasing.\n");
return 0;
}
That’s three of the four intro promises delivered — automatic ordering, barrier insertion, and dead-pass culling. The only piece missing: resources still live for the entire frame. Version 3 fixes that with lifetime analysis and memory aliasing.
UE5’s RDG does the same thing. When you call FRDGBuilder::AddPass, RDG builds the dependency graph from your declared reads/writes, topologically sorts it, culls dead passes, and inserts barriers — all before recording a single GPU command.
💾 MVP v3 — Lifetimes & Aliasing#
V2 gives us ordering, culling, and barriers — but every transient resource lives for the entire frame. A 1080p deferred pipeline allocates ~52 MB of transient textures that are each used for only 2–3 passes. If their lifetimes don’t overlap, they can share physical memory. That’s aliasing, and it typically saves 30–50% VRAM.
The algorithm has two steps. First, scan lifetimes: walk the sorted pass list and record each transient resource’s firstUse and lastUse pass indices (imported resources are excluded — they’re externally owned). Second, free-list scan: sort resources by first-use, then greedily try to fit each one into an existing physical block that’s compatible (same memory type, large enough, and whose last user finished before this resource’s first use). Fit → reuse. No fit → allocate a new block. This is greedy interval-coloring.
Without aliasing, every transient resource is a committed allocation — its own chunk of VRAM from creation to end of frame, even if it’s only used for 2–3 passes. Here’s what that looks like for six transient resources at 1080p:
Most of that memory sits idle. The colored bars show when each resource is actually used — everything else is waste. The graph knows every lifetime, so it can do better. Resources whose lifetimes don’t overlap can share the same physical memory:
31% saved — in complex pipelines: 40–50%.
This requires placed resources at the API level — GPU memory allocated from a heap, with resources bound to offsets within it. In D3D12, that means ID3D12Heap + CreatePlacedResource. In Vulkan, VkDeviceMemory + vkBindImageMemory at different offsets. Without placed resources (i.e., CreateCommittedResource or Vulkan dedicated allocations), each resource gets its own memory and aliasing is impossible — which is why the graph’s allocator works with heaps.
Drag the interactive timeline below to see how resources share physical blocks as their lifetimes end:
🧩 Putting it together — v2 → v3 diff#
Two additions to the FrameGraph class: (1) a lifetime scan that records each transient resource’s first and last use in the sorted pass order, and (2) a greedy free-list allocator that reuses physical blocks when lifetimes don’t overlap.
Complete v3 source — all v2 code plus lifetime analysis and aliasing:
#pragma once
// Frame Graph MVP v3 — Lifetimes & Aliasing
// Adds: lifetime analysis, greedy free-list memory aliasing.
// Builds on v2 (dependencies, topo-sort, culling, barriers).
//
// Compile: any C++17 compiler (header-only, no GPU backend needed)
// g++ -std=c++17 -c frame_graph_v3.h
// or just #include it from your example/test file.
#include <algorithm>
#include <cassert>
#include <cstdint>
#include <cstdio>
#include <functional>
#include <numeric>
#include <queue>
#include <string>
#include <unordered_set>
#include <vector>
// ── Resource description (virtual until compile) ──────────────
enum class Format { RGBA8, RGBA16F, R8, D32F };
struct ResourceDesc {
uint32_t width = 0;
uint32_t height = 0;
Format format = Format::RGBA8;
};
struct ResourceHandle {
uint32_t index = UINT32_MAX;
bool isValid() const { return index != UINT32_MAX; }
};
// ── Resource state tracking ───────────────────────────────────
enum class ResourceState { Undefined, ColorAttachment, DepthAttachment,
ShaderRead, Present };
inline const char* stateName(ResourceState s) {
switch (s) {
case ResourceState::Undefined: return "Undefined";
case ResourceState::ColorAttachment: return "ColorAttachment";
case ResourceState::DepthAttachment: return "DepthAttachment";
case ResourceState::ShaderRead: return "ShaderRead";
case ResourceState::Present: return "Present";
default: return "?";
}
}
struct ResourceVersion {
uint32_t writerPass = UINT32_MAX;
std::vector<uint32_t> readerPasses;
};
struct ResourceEntry {
ResourceDesc desc;
std::vector<ResourceVersion> versions;
ResourceState currentState = ResourceState::Undefined;
};
// ── Physical memory block (NEW v3) ────────────────────────────
struct PhysicalBlock {
uint32_t sizeBytes = 0;
Format format = Format::RGBA8;
uint32_t availAfter = 0; // pass index after which this block is free
};
// ── Bytes-per-pixel helper (NEW v3) ───────────────────────────
inline uint32_t bytesPerPixel(Format fmt) {
switch (fmt) {
case Format::R8: return 1;
case Format::RGBA8: return 4;
case Format::D32F: return 4;
case Format::RGBA16F: return 8;
default: return 4;
}
}
// ── Lifetime info per resource (NEW v3) ───────────────────────
struct Lifetime {
uint32_t firstUse = UINT32_MAX;
uint32_t lastUse = 0;
bool isTransient = true;
};
// ── Render pass ───────────────────────────────────────────────
struct RenderPass {
std::string name;
std::function<void()> setup;
std::function<void(/*cmd list*/)> execute;
std::vector<ResourceHandle> reads;
std::vector<ResourceHandle> writes;
std::vector<uint32_t> dependsOn;
std::vector<uint32_t> successors;
uint32_t inDegree = 0;
bool alive = false;
};
// ── Frame graph (v3: full MVP) ────────────────────────────────
class FrameGraph {
public:
ResourceHandle createResource(const ResourceDesc& desc) {
entries_.push_back({ desc, {{}}, ResourceState::Undefined });
return { static_cast<uint32_t>(entries_.size() - 1) };
}
void read(uint32_t passIdx, ResourceHandle h) {
auto& ver = entries_[h.index].versions.back();
if (ver.writerPass != UINT32_MAX) {
passes_[passIdx].dependsOn.push_back(ver.writerPass);
}
ver.readerPasses.push_back(passIdx);
passes_[passIdx].reads.push_back(h);
}
void write(uint32_t passIdx, ResourceHandle h) {
entries_[h.index].versions.push_back({});
entries_[h.index].versions.back().writerPass = passIdx;
passes_[passIdx].writes.push_back(h);
}
template <typename SetupFn, typename ExecFn>
void addPass(const std::string& name, SetupFn&& setup, ExecFn&& exec) {
uint32_t idx = static_cast<uint32_t>(passes_.size());
passes_.push_back({ name, std::forward<SetupFn>(setup),
std::forward<ExecFn>(exec) });
currentPass_ = idx;
passes_.back().setup();
}
// ── v3: compile — builds the execution plan + allocates memory ──
struct CompiledPlan {
std::vector<uint32_t> sorted;
std::vector<uint32_t> mapping; // mapping[virtualIdx] → physicalBlock
};
CompiledPlan compile() {
printf("\n[1] Building dependency edges...\n");
buildEdges();
printf("[2] Topological sort...\n");
auto sorted = topoSort();
printf("[3] Culling dead passes...\n");
cull(sorted);
printf("[4] Scanning resource lifetimes...\n");
auto lifetimes = scanLifetimes(sorted); // NEW v3
printf("[5] Aliasing resources (greedy free-list)...\n");
auto mapping = aliasResources(lifetimes); // NEW v3
// Physical bindings are now decided — execute can't change them.
// This makes the compiled plan cacheable and thread-safe.
return { std::move(sorted), std::move(mapping) };
}
// ── v3: execute — just records GPU commands ───────────────
void execute() {
auto plan = compile();
// In a real renderer: bind physical memory blocks using plan.mapping here.
// Each virtual handle maps to an aliased physical slot decided at compile time.
// Our MVP prints the mapping but skips actual GPU binding.
printf("[6] Executing (with automatic barriers):\n");
for (uint32_t idx : plan.sorted) {
if (!passes_[idx].alive) {
printf(" -- skip: %s (CULLED)\n", passes_[idx].name.c_str());
continue;
}
insertBarriers(idx);
passes_[idx].execute(/* &cmdList */);
}
passes_.clear();
entries_.clear();
}
private:
uint32_t currentPass_ = 0;
std::vector<RenderPass> passes_;
std::vector<ResourceEntry> entries_;
// ── Build dependency edges ────────────────────────────────
void buildEdges() {
for (uint32_t i = 0; i < passes_.size(); i++) {
std::unordered_set<uint32_t> seen;
for (uint32_t dep : passes_[i].dependsOn) {
if (seen.insert(dep).second) {
passes_[dep].successors.push_back(i);
passes_[i].inDegree++;
}
}
}
}
// ── Kahn's topological sort — O(V + E) ────────────────────
std::vector<uint32_t> topoSort() {
std::queue<uint32_t> q;
std::vector<uint32_t> inDeg(passes_.size());
for (uint32_t i = 0; i < passes_.size(); i++) {
inDeg[i] = passes_[i].inDegree;
if (inDeg[i] == 0) q.push(i);
}
std::vector<uint32_t> order;
while (!q.empty()) {
uint32_t cur = q.front(); q.pop();
order.push_back(cur);
for (uint32_t succ : passes_[cur].successors) {
if (--inDeg[succ] == 0)
q.push(succ);
}
}
assert(order.size() == passes_.size() && "Cycle detected!");
printf(" Topological order: ");
for (uint32_t i = 0; i < order.size(); i++) {
printf("%s%s", passes_[order[i]].name.c_str(),
i + 1 < order.size() ? " -> " : "\n");
}
return order;
}
// ── Cull dead passes ──────────────────────────────────────
void cull(const std::vector<uint32_t>& sorted) {
if (sorted.empty()) return;
passes_[sorted.back()].alive = true;
for (int i = static_cast<int>(sorted.size()) - 1; i >= 0; i--) {
if (!passes_[sorted[i]].alive) continue;
for (uint32_t dep : passes_[sorted[i]].dependsOn)
passes_[dep].alive = true;
}
printf(" Culling result: ");
for (uint32_t i = 0; i < passes_.size(); i++) {
printf("%s=%s%s", passes_[i].name.c_str(),
passes_[i].alive ? "ALIVE" : "DEAD",
i + 1 < passes_.size() ? ", " : "\n");
}
}
// ── Insert barriers ───────────────────────────────────────
void insertBarriers(uint32_t passIdx) {
auto stateForUsage = [](bool isWrite, Format fmt) {
if (isWrite)
return (fmt == Format::D32F) ? ResourceState::DepthAttachment
: ResourceState::ColorAttachment;
return ResourceState::ShaderRead;
};
for (auto& h : passes_[passIdx].reads) {
ResourceState needed = ResourceState::ShaderRead;
if (entries_[h.index].currentState != needed) {
printf(" barrier: resource[%u] %s -> %s\n",
h.index,
stateName(entries_[h.index].currentState),
stateName(needed));
entries_[h.index].currentState = needed;
}
}
for (auto& h : passes_[passIdx].writes) {
ResourceState needed = stateForUsage(true, entries_[h.index].desc.format);
if (entries_[h.index].currentState != needed) {
printf(" barrier: resource[%u] %s -> %s\n",
h.index,
stateName(entries_[h.index].currentState),
stateName(needed));
entries_[h.index].currentState = needed;
}
}
}
// ── Scan lifetimes (NEW v3) ───────────────────────────────
std::vector<Lifetime> scanLifetimes(const std::vector<uint32_t>& sorted) {
std::vector<Lifetime> life(entries_.size());
for (uint32_t order = 0; order < sorted.size(); order++) {
uint32_t passIdx = sorted[order];
if (!passes_[passIdx].alive) continue;
for (auto& h : passes_[passIdx].reads) {
life[h.index].firstUse = std::min(life[h.index].firstUse, order);
life[h.index].lastUse = std::max(life[h.index].lastUse, order);
}
for (auto& h : passes_[passIdx].writes) {
life[h.index].firstUse = std::min(life[h.index].firstUse, order);
life[h.index].lastUse = std::max(life[h.index].lastUse, order);
}
}
printf(" Lifetimes (in sorted pass order):\n");
for (uint32_t i = 0; i < life.size(); i++) {
if (life[i].firstUse == UINT32_MAX) {
printf(" resource[%u] unused (dead)\n", i);
} else {
printf(" resource[%u] alive [pass %u .. pass %u]\n",
i, life[i].firstUse, life[i].lastUse);
}
}
return life;
}
// ── Greedy free-list aliasing (NEW v3) ────────────────────
std::vector<uint32_t> aliasResources(const std::vector<Lifetime>& lifetimes) {
std::vector<PhysicalBlock> freeList;
std::vector<uint32_t> mapping(entries_.size(), UINT32_MAX);
uint32_t totalWithout = 0;
std::vector<uint32_t> indices(entries_.size());
std::iota(indices.begin(), indices.end(), 0);
std::sort(indices.begin(), indices.end(), [&](uint32_t a, uint32_t b) {
return lifetimes[a].firstUse < lifetimes[b].firstUse;
});
printf(" Aliasing:\n");
for (uint32_t resIdx : indices) {
if (!lifetimes[resIdx].isTransient) continue;
if (lifetimes[resIdx].firstUse == UINT32_MAX) continue;
uint32_t needed = entries_[resIdx].desc.width
* entries_[resIdx].desc.height
* bytesPerPixel(entries_[resIdx].desc.format);
totalWithout += needed;
bool reused = false;
for (uint32_t b = 0; b < freeList.size(); b++) {
if (freeList[b].availAfter < lifetimes[resIdx].firstUse
&& freeList[b].sizeBytes >= needed) {
mapping[resIdx] = b;
freeList[b].availAfter = lifetimes[resIdx].lastUse;
reused = true;
printf(" resource[%u] -> reuse physical block %u "
"(%.1f MB, lifetime [%u..%u])\n",
resIdx, b, needed / (1024.0f * 1024.0f),
lifetimes[resIdx].firstUse,
lifetimes[resIdx].lastUse);
break;
}
}
if (!reused) {
mapping[resIdx] = static_cast<uint32_t>(freeList.size());
printf(" resource[%u] -> NEW physical block %u "
"(%.1f MB, lifetime [%u..%u])\n",
resIdx, static_cast<uint32_t>(freeList.size()),
needed / (1024.0f * 1024.0f),
lifetimes[resIdx].firstUse,
lifetimes[resIdx].lastUse);
freeList.push_back({ needed, entries_[resIdx].desc.format,
lifetimes[resIdx].lastUse });
}
}
uint32_t totalWith = 0;
for (auto& blk : freeList) totalWith += blk.sizeBytes;
printf(" Memory: %u physical blocks for %u virtual resources\n",
static_cast<uint32_t>(freeList.size()),
static_cast<uint32_t>(entries_.size()));
printf(" Without aliasing: %.1f MB\n",
totalWithout / (1024.0f * 1024.0f));
printf(" With aliasing: %.1f MB (saved %.1f MB, %.0f%%)\n",
totalWith / (1024.0f * 1024.0f),
(totalWithout - totalWith) / (1024.0f * 1024.0f),
totalWithout > 0 ? 100.0f * (totalWithout - totalWith) / totalWithout : 0.0f);
return mapping;
}
};
// Frame Graph MVP v3 -- Usage Example
// Compile: g++ -std=c++17 -o example_v3 example_v3.cpp
#include "frame_graph_v3.h"
#include <cstdio>
int main() {
printf("=== Frame Graph v3: Lifetimes & Memory Aliasing ===\n");
printf("Adds: lifetime analysis, greedy free-list aliasing.\n");
printf("Resources that don't overlap in time share GPU memory.\n\n");
FrameGraph fg;
auto depth = fg.createResource({1920, 1080, Format::D32F});
auto gbufA = fg.createResource({1920, 1080, Format::RGBA8});
auto gbufN = fg.createResource({1920, 1080, Format::RGBA8});
auto hdr = fg.createResource({1920, 1080, Format::RGBA16F});
auto bloom = fg.createResource({960, 540, Format::RGBA16F});
auto debug = fg.createResource({1920, 1080, Format::RGBA8});
printf("Resources created:\n");
printf(" [0] depth 1920x1080 D32F (%.1f MB)\n", 1920*1080*4 / (1024.0f*1024.0f));
printf(" [1] gbufA 1920x1080 RGBA8 (%.1f MB)\n", 1920*1080*4 / (1024.0f*1024.0f));
printf(" [2] gbufN 1920x1080 RGBA8 (%.1f MB)\n", 1920*1080*4 / (1024.0f*1024.0f));
printf(" [3] hdr 1920x1080 RGBA16F (%.1f MB)\n", 1920*1080*8 / (1024.0f*1024.0f));
printf(" [4] bloom 960x540 RGBA16F (%.1f MB)\n", 960*540*8 / (1024.0f*1024.0f));
printf(" [5] debug 1920x1080 RGBA8 (unused -- should be culled)\n\n");
fg.addPass("DepthPrepass",
[&]() { fg.write(0, depth); },
[&](/*cmd*/) { printf(" >> exec: DepthPrepass (writes depth)\n"); });
fg.addPass("GBuffer",
[&]() { fg.read(1, depth); fg.write(1, gbufA); fg.write(1, gbufN); },
[&](/*cmd*/) { printf(" >> exec: GBuffer (reads depth, writes gbufA, gbufN)\n"); });
fg.addPass("Lighting",
[&]() { fg.read(2, gbufA); fg.read(2, gbufN); fg.write(2, hdr); },
[&](/*cmd*/) { printf(" >> exec: Lighting (reads gbufA+gbufN, writes hdr)\n"); });
fg.addPass("Bloom",
[&]() { fg.read(3, hdr); fg.write(3, bloom); },
[&](/*cmd*/) { printf(" >> exec: Bloom (reads hdr, writes bloom)\n"); });
fg.addPass("Tonemap",
[&]() { fg.read(4, bloom); fg.write(4, hdr); },
[&](/*cmd*/) { printf(" >> exec: Tonemap (reads bloom, writes hdr -> final)\n"); });
// Dead pass -- nothing reads debug, so the graph will cull it.
fg.addPass("DebugOverlay",
[&]() { fg.write(5, debug); },
[&](/*cmd*/) { printf(" >> exec: DebugOverlay (writes debug)\n"); });
printf("Passes: DepthPrepass, GBuffer, Lighting, Bloom, Tonemap, DebugOverlay\n");
printf("Note: gbufA and gbufN are dead after Lighting -- their memory\n");
printf(" can be reused by bloom/hdr in later passes.\n\n");
printf("Compiling graph...\n");
fg.execute();
printf("\n[OK] v3 complete -- full MVP:\n");
printf(" - Automatic dependency sorting\n");
printf(" - Dead-pass culling (DebugOverlay removed)\n");
printf(" - Automatic barrier insertion\n");
printf(" - Lifetime analysis + memory aliasing\n");
printf(" - Feature-equivalent to Frostbite 2017 GDC demo\n");
return 0;
}
~70 new lines on top of v2. Aliasing runs once per frame in O(R log R) — sort, then linear scan of the free list. Sub-microsecond for 15 transient resources.
That’s the full value prop — automatic memory aliasing and automatic barriers from a single FrameGraph class. UE5’s transient resource allocator does the same thing: any FRDGTexture created through FRDGBuilder::CreateTexture (vs RegisterExternalTexture) is transient and eligible for aliasing, using the same lifetime analysis and free-list scan we just built.
✅ What the MVP delivers#
Three iterations produced a single FrameGraph class. Here’s what it does every frame, broken down by phase — the same declare → compile → execute lifecycle from Part I:
addPass runs its setup lambda:• declare reads & writes
• request virtual resources
• version tracking builds edges
• sort — topo order (Kahn's)
• cull — kill dead passes
• scan lifetimes — first/last use
• alias — free-list reuse
• compute barriers
• insert automatic barriers
• call execute lambda
• resources already aliased & bound
Compile cost by step:
| Compile step | Complexity | Algorithm |
|---|---|---|
| Topological sort | O(V + E) | Kahn's — passes + edges |
| Pass culling | O(V + E) | Backward reachability from output |
| Lifetime scan | O(V + E) | Walk sorted passes and their read/write edges |
| Aliasing | O(R log R) | Sort by first-use, greedy free-list scan |
| Barrier computation | O(V + E) | Walk passes and their read/write edges with state lookup |
The graph doesn’t care about your rendering strategy. It cares about your dependencies. Deferred or forward, the same FrameGraph class handles both — different topology, same automatic barriers and aliasing. That’s the whole point.
