Part I laid out the theory: declare, compile, execute. Now we turn that blueprint into code. Three iterations, each one building on the last: v1 lays the scaffold, v2 adds dependency-driven execution order (topological sort, pass culling, and automatic barriers), and v3 introduces lifetime analysis so non-overlapping resources can share the same heap. Let’s build it.
Architecture & API Decisions#
We start from the API you want to write, then build toward it, starting with bare scaffolding and ending with automatic barriers and memory aliasing.
classDiagram
direction TB
class FrameGraph{
+ResourceHandle CreateResource(desc)
+ResourceHandle ImportResource(desc, state)
+AddPass(name, setup, execute)
+Read(passIdx, handle)
+Write(passIdx, handle)
+ReadWrite(passIdx, handle)
+CompiledPlan Compile()
+Execute(plan)
-BuildEdges()
-PassIndex[] TopoSort()
-Cull(sortedPasses)
-Lifetime[] ScanLifetimes(sortedPasses)
-BlockIndex[] AliasResources(lifetimes)
-ResourceState StateForUsage(passIdx, handle, isWrite)
-Barrier[][] ComputeBarriers(sortedPasses, mapping)
-EmitBarriers(barriers)
}
class ResourceHandle{
+ResourceIndex index
+bool IsValid()
}
class ResourceDesc{
+uint32 width
+uint32 height
+Format format
}
class Format{
RGBA8
RGBA16F
R8
D32F
}
class RenderPass{
+string name
+function Setup
+function Execute
+ResourceHandle[] reads
+ResourceHandle[] writes
+ResourceHandle[] readWrites
+PassIndex[] dependsOn
+PassIndex[] successors
+uint32 inDegree
+bool alive
}
class ResourceEntry{
+ResourceDesc desc
+ResourceVersion[] versions
+ResourceState currentState
+bool imported
}
class ResourceVersion{
+PassIndex writerPass
+PassIndex[] readerPasses
}
class ResourceState{
Undefined
ColorAttachment
DepthAttachment
ShaderRead
UnorderedAccess
Present
}
class CompiledPlan{
+PassIndex[] sorted
+BlockIndex[] mapping
+Barrier[][] barriers
}
class Barrier{
+ResourceIndex resourceIndex
+ResourceState oldState
+ResourceState newState
+bool isAliasing
+ResourceIndex aliasBefore
}
class Lifetime{
+PassIndex firstUse
+PassIndex lastUse
+bool isTransient
}
class PhysicalBlock{
+uint32 sizeBytes
+PassIndex availAfter
}
FrameGraph *-- RenderPass : owns passes
FrameGraph *-- ResourceEntry : owns resources
FrameGraph ..> CompiledPlan : produces
FrameGraph ..> ResourceHandle : returns
FrameGraph ..> Lifetime : computes per resource
FrameGraph ..> PhysicalBlock : allocates from free-list
RenderPass --> ResourceHandle : reads/writes
ResourceEntry *-- ResourceDesc : describes
ResourceEntry *-- ResourceVersion : tracks per version
ResourceEntry --> ResourceState : current state
ResourceDesc --> Format : pixel format
CompiledPlan *-- Barrier : pre-pass transitions
Barrier --> ResourceState : old/new state
style ResourceHandle stroke:#d97706,stroke-width:2.5px
style ResourceDesc stroke:#d97706,stroke-width:2.5px
style Format stroke:#d97706,stroke-width:2.5px
style RenderPass stroke:#d97706,stroke-width:2.5px
style ResourceEntry stroke:#6366f1,stroke-width:2.5px
style ResourceVersion stroke:#6366f1,stroke-width:2.5px
style ResourceState stroke:#6366f1,stroke-width:2.5px
style Barrier stroke:#6366f1,stroke-width:2.5px
style CompiledPlan stroke:#6366f1,stroke-width:2.5px
style Lifetime stroke:#16a34a,stroke-width:2.5px
style PhysicalBlock stroke:#16a34a,stroke-width:2.5px
Design choices#
The three-phase model from Part I forces eight API decisions. Every choice is driven by the same question: what does the graph compiler need, and what’s the cheapest way to give it?
| # | Question | Our pick | Why | Alternative |
|---|---|---|---|---|
| DECLARE: how passes and resources enter the graph | ||||
| ① | How does setup talk to execute? | Lambda captures | Zero boilerplate: handles live in scope, both lambdas capture them directly. | Type-erased pass data: AddPass<PassData>(setup, exec). Decouples setup/execute across TUs. |
| ② | Where do DAG edges come from? | Explicit fg.Read/Write(pass, h) | Every edge is an explicit call, easy to grep and debug. | Scoped builder: builder.Read/Write(h) auto-binds to the current pass. Prevents mis-wiring at scale. |
| ③ | What is a resource handle? | Plain uint32_t index | One integer, trivially copyable. No templates, no overhead. | Typed wrappers: FRDGTextureRef / FRDGBufferRef. Compile-time safety for 700+ passes (UE5). |
| COMPILE: what the graph analyser decides | ||||
| ④ | Is compile explicit? | Yes: Compile()→Execute(plan) | Returned plan struct lets you log, validate, and visualise the DAG. Invaluable while learning. | Implicit: Execute() compiles internally. Simpler call site, less ceremony. |
| ⑤ | How does culling find the root? | Last sorted pass | Zero config, Present is naturally last in topo order. Breaks with multiple output roots; add a NeverCull flag when you need them. | Write-to-imported heuristic + NeverCull flags. Supports multiple output roots. |
| ⑥ | Queue model? | Single graphics queue | Keeps barrier logic to plain resource state transitions. No cross-queue barriers. Multi-queue is a compiler feature layered on top; clean upgrade path. | Multi-queue + async compute. 10–30% GPU uplift but needs fences & cross-queue barriers. Part III. |
| ⑦ | Rebuild frequency? | Full rebuild every frame | You need a significantly more complex frame before this becomes visibly heavy. For an MVP, full rebuild is fine. | Cached topology: re-compile only on structural change. Near-zero steady-state cost but complex invalidation logic. |
| EXECUTE: how the compiled plan becomes GPU commands | ||||
| ⑧ | Recording strategy? | Single command list | Sequential walk: trivial to implement and debug. CPU cost is noise at ~25 passes. Swap to parallel deferred command lists when pass count exceeds ~60. | Parallel command lists: one per pass group, recorded across threads. Scales to 100+ passes (UE5). |
The Target API#
With those choices made, here’s where we’re headed: the complete API.
api_demo.cpp (62 lines — click anywhere to expand)// Frame Graph — API demo (4 passes, imported backbuffer)
#include "frame_graph_v3.h"
int main()
{
FrameGraph fg;
// [1] Declare — describe resources and register passes
// Frame Graph — API demo (4 passes, imported backbuffer)
#include "frame_graph_v3.h"
int main()
{
FrameGraph fg;
// [1] Declare — describe resources and register passes
// Frame Graph — API demo (4 passes, imported backbuffer)
#include "frame_graph_v3.h"
int main()
{
FrameGraph fg;
// [1] Declare — describe resources and register passes
// Import the swapchain backbuffer — externally owned, not aliased.
auto backbuffer = fg.ImportResource({1920, 1080, Format::RGBA8}, ResourceState::Present);
// Transient resources — graph-owned, eligible for aliasing.
auto depth = fg.CreateResource({1920, 1080, Format::D32F});
auto gbufA = fg.CreateResource({1920, 1080, Format::RGBA8});
auto gbufN = fg.CreateResource({1920, 1080, Format::RGBA8});
auto hdr = fg.CreateResource({1920, 1080, Format::RGBA16F});
fg.AddPass(
"DepthPrepass",
[&]()
{
fg.Write(0, depth);
},
[&](/*cmd*/) { /* draw scene depth-only */ });
fg.AddPass(
"GBuffer",
[&]()
{
fg.Read(1, depth);
fg.Write(1, gbufA);
fg.Write(1, gbufN);
},
[&](/*cmd*/) { /* draw scene to GBuffer MRTs */ });
fg.AddPass(
"Lighting",
[&]()
{
fg.Read(2, gbufA);
fg.Read(2, gbufN);
fg.Write(2, hdr);
},
[&](/*cmd*/) { /* fullscreen lighting pass */ });
fg.AddPass(
"Present",
[&]()
{
fg.Read(3, hdr);
fg.Write(3, backbuffer);
},
[&](/*cmd*/) { /* copy to backbuffer, present */ });
// [2] Compile — topo-sort, cull, alias memory, compute barriers
auto plan = fg.Compile();
// [3] Execute — replay precomputed barriers + run each pass
fg.Execute(plan);
}
v1: The Scaffold#
Three types are all we need to start: a ResourceDesc (width, height, format, no GPU handle yet), a ResourceHandle that’s just an index, and a RenderPass with setup + execute lambdas. The FrameGraph class owns arrays of both and runs passes in declaration order. No dependency tracking, no barriers. Just the foundation that v2 and v3 build on.
v1, Resource types (frame_graph_v1.h)
@@ frame_graph_v1.h, Format, ResourceDesc, ResourceHandle @@+enum class Format+{+ RGBA8,+ RGBA16F,+ R8,+ D32F+};++struct ResourceDesc+{+ uint32_t width = 0;+ uint32_t height = 0;+ Format format = Format::RGBA8;+};++using ResourceIndex = uint32_t; // readable alias for resource array indices++// Lightweight handle, index into FrameGraph's resource array, no GPU memory involved.+struct ResourceHandle+{+ ResourceIndex index = UINT32_MAX;+ bool IsValid() const { return index != UINT32_MAX; }+};
A pass is two lambdas: setup (runs now, wires the DAG) and execute (stored for later, records GPU commands). v1 doesn’t use setup yet, but the slot is there for v2:
v1, RenderPass struct (frame_graph_v1.h)
@@ frame_graph_v1.h, RenderPass struct @@+// A render pass: Setup wires the DAG (declares reads/writes), Execute records GPU commands.+struct RenderPass+{+ std::string name;+ std::function<void()> Setup; // build the DAG (v1: unused)+ std::function<void(/*cmd list*/)> Execute; // record GPU commands+};
The FrameGraph class owns two arrays: passes and resources. AddPass runs setup immediately, so the DAG is built during declaration, not lazily. Execute() in v1 just walks passes in order:
v1, FrameGraph class (frame_graph_v1.h)
@@ frame_graph_v1.h, FrameGraph class @@+class FrameGraph+{+ public:+ ResourceHandle CreateResource(const ResourceDesc& desc); // transient, graph owns lifetime+ ResourceHandle ImportResource(const ResourceDesc& desc); // external, e.g. swapchain backbuffer++ // AddPass: runs setup immediately (wires DAG), stores execute for later.+ template <typename SetupFn, typename ExecFn>+ void AddPass(const std::string& name, SetupFn&& setup, ExecFn&& exec)+ {+ passes.push_back({name, std::forward<SetupFn>(setup), std::forward<ExecFn>(exec)});+ passes.back().Setup(); // run setup immediately, DAG is built here+ }++ void Execute(); // v1: just run in declaration order++ private:+ std::vector<RenderPass> passes;+ std::vector<ResourceDesc> resources;+};
CreateResource and ImportResource both push a descriptor into the resources array and return a handle. No GPU memory yet. That happens at execute time. In v1 they’re identical, and v2 will differentiate imported resources:
v1, CreateResource / ImportResource
@@ frame_graph_v1.cpp, CreateResource / ImportResource @@+// No GPU memory is allocated yet, that happens at execute time.+ResourceHandle FrameGraph::CreateResource(const ResourceDesc& desc)+{+ resources.push_back(desc);+ return {static_cast<ResourceIndex>(resources.size() - 1)};+}++ResourceHandle FrameGraph::ImportResource(const ResourceDesc& desc)+{+ resources.push_back(desc); // v1: same as create (no aliasing yet)+ return {static_cast<ResourceIndex>(resources.size() - 1)};+}
Execute() is the simplest possible loop: walk passes in declaration order, call each callback, clear everything for the next frame. No compile step, no reordering, just playback:
v1, Execute()
@@ frame_graph_v1.cpp, Execute() @@+// v1 execute: just run passes in the order they were declared.+void FrameGraph::Execute()+{+ printf("\n[1] Executing (declaration order -- no compile step):\n");+ for (auto& pass : passes)+ {+ printf(" >> exec: %s\n", pass.name.c_str());+ pass.Execute(/* &cmdList */);+ }+ passes.clear(); // reset for next frame+ resources.clear();+}
Full source and runnable example:
#pragma once
// Frame Graph MVP v1 -- Declare & Execute
// No dependency tracking, no barriers, no aliasing.
// Passes execute in declaration order.
//
// Compile: clang++ -std=c++17 -o example_v1 example_v1.cpp
// or: g++ -std=c++17 -o example_v1 example_v1.cpp
#include <cstdint>
#include <functional>
#include <string>
#include <vector>
// == Resource description (virtual until compile) ==============
enum class Format
{
RGBA8,
RGBA16F,
R8,
D32F
};
struct ResourceDesc
{
uint32_t width = 0;
uint32_t height = 0;
Format format = Format::RGBA8;
};
using ResourceIndex = uint32_t;
// Handle = typed index into the graph's resource array.
// No GPU memory behind it yet -- just a number.
struct ResourceHandle
{
ResourceIndex index = UINT32_MAX;
bool IsValid() const { return index != UINT32_MAX; }
};
// == Render pass ===============================================
struct RenderPass
{
std::string name;
std::function<void()> Setup; // build the DAG (v1: unused)
std::function<void(/*cmd list*/)> Execute; // record GPU commands
};
// == Frame graph ===============================================
class FrameGraph
{
public:
// Create a virtual resource -- returns a handle, not GPU memory.
ResourceHandle CreateResource(const ResourceDesc& desc);
// Import an external resource (e.g. swapchain backbuffer).
// Barriers are tracked, but the graph does not own its memory.
ResourceHandle ImportResource(const ResourceDesc& desc);
// Register a pass. Setup runs now; execute is stored for later.
template <typename SetupFn, typename ExecFn> void AddPass(const std::string& name, SetupFn&& setup, ExecFn&& exec)
{
passes.push_back({name, std::forward<SetupFn>(setup), std::forward<ExecFn>(exec)});
passes.back().Setup(); // run setup immediately
}
// Compile + execute. v1 is trivial -- just run in declaration order.
void Execute();
private:
std::vector<RenderPass> passes;
std::vector<ResourceDesc> resources;
};
#include "frame_graph_v1.h"
#include <cstdio>
// == FrameGraph implementation =================================
ResourceHandle FrameGraph::CreateResource(const ResourceDesc& desc)
{
resources.push_back(desc);
return {static_cast<ResourceIndex>(resources.size() - 1)};
}
ResourceHandle FrameGraph::ImportResource(const ResourceDesc& desc)
{
resources.push_back(desc); // v1: same as create (no aliasing yet)
return {static_cast<ResourceIndex>(resources.size() - 1)};
}
void FrameGraph::Execute()
{
// v1: no compile step -- no sorting, no culling, no barriers.
// Just run every pass in the order it was added.
printf("\n[1] Executing (declaration order -- no compile step):\n");
for (auto& pass : passes)
{
printf(" >> exec: %s\n", pass.name.c_str());
pass.Execute(/* &cmdList */);
}
// Frame over -- clear everything for next frame.
passes.clear();
resources.clear();
}
// Frame Graph MVP v1 -- Usage Example
// Compile: clang++ -std=c++17 -o example_v1 example_v1.cpp
#include "frame_graph_v1.h"
#include "frame_graph_v1.cpp" // single-TU build (Godbolt)
#include <cstdio>
int main()
{
printf("=== Frame Graph v1: Declare & Execute ===\n");
FrameGraph fg;
// Import the swapchain backbuffer — the graph tracks barriers
// but does not own its memory (it lives outside the frame).
auto backbuffer = fg.ImportResource({1920, 1080, Format::RGBA8});
auto depth = fg.CreateResource({1920, 1080, Format::D32F});
auto gbufA = fg.CreateResource({1920, 1080, Format::RGBA8});
auto gbufN = fg.CreateResource({1920, 1080, Format::RGBA8});
auto hdr = fg.CreateResource({1920, 1080, Format::RGBA16F});
fg.AddPass(
"DepthPrepass",
[&]() { /* setup — v1 doesn't use this */ },
[&](/*cmd*/)
{
printf(" draw scene depth-only\n");
});
fg.AddPass(
"GBuffer",
[&]() { /* setup */ },
[&](/*cmd*/)
{
printf(" draw scene -> GBuffer MRTs\n");
});
fg.AddPass(
"Lighting",
[&]() { /* setup */ },
[&](/*cmd*/)
{
printf(" fullscreen lighting pass\n");
});
fg.AddPass(
"Present",
[&]() { /* setup */ },
[&](/*cmd*/)
{
printf(" copy HDR -> backbuffer\n");
});
fg.Execute();
return 0;
}
Compiles and runs: the execute lambdas are stubs, but the scaffolding is real. Every piece we add in v2 and v3 goes into this same FrameGraph class.
The lambda-based pass declaration pattern works. You can already compose passes without manual barrier calls (even though barriers are no-ops here).
Executes passes in declaration order, creates every resource upfront. Correct but wasteful. Version 2 adds the graph.
MVP v2: Dependencies & Barriers#
Four steps in strict order, each one’s output feeding the next:
Resource versioning: the data structure#
Every write bumps a version number, and every read attaches to the current version. That’s enough to produce precise dependency edges (theory refresher).
The key data structure: each resource entry tracks its current version (incremented on write) and a writer pass index per version. When a pass calls Read(h), the graph looks up the current version’s writer and adds a dependency edge from that writer to the reading pass.
From v1, the core change is in resource tracking. The ResourceDesc array becomes ResourceEntry, each entry carrying a version list and an imported flag. ResourceVersion tracks which pass wrote each version and which passes read it. This is the data Read/Write use to build edges:
v1 → v2, ResourceVersion & ResourceEntry
@@ frame_graph_v2.h, PassIndex alias, ResourceVersion, ResourceEntry @@+using PassIndex = uint32_t; // readable alias for pass array indices++struct ResourceVersion+{+ PassIndex writerPass = UINT32_MAX; // Each Read() links to the current version's writer → automatic dependency edge.+ std::vector<PassIndex> readerPasses; // Each Write() to a resource creates a new version.+ bool HasWriter() const { return writerPass != UINT32_MAX; }+};++// Replaces raw ResourceDesc, now tracks version history per resource.+struct ResourceEntry+{+ ResourceDesc desc;+ std::vector<ResourceVersion> versions; // version 0, 1, 2...+ bool imported = false; // imported = externally owned (e.g. swapchain)+};
RenderPass gains reads, writes, readWrites (UAV), and dependsOn vectors. The FrameGraph class adds three new methods (Read(), Write(), ReadWrite()) and the internal storage switches from a flat ResourceDesc array to ResourceEntry:
v1 → v2, RenderPass & FrameGraph API changes
@@ frame_graph_v2.h, RenderPass gets reads/writes/dependsOn @@struct RenderPass{std::string name;std::function<void()> Setup;std::function<void(/*cmd list*/)> Execute;+ std::vector<ResourceHandle> reads;+ std::vector<ResourceHandle> writes;+ std::vector<ResourceHandle> readWrites; // UAV (explicit)+ std::vector<PassIndex> dependsOn;};@@ frame_graph_v2.h, FrameGraph adds Read(), Write(), ReadWrite() @@+ void Read(PassIndex passIdx, ResourceHandle h);+ void Write(PassIndex passIdx, ResourceHandle h);+ void ReadWrite(PassIndex passIdx, ResourceHandle h); // UAV access@@ frame_graph_v2.h, ResourceDesc[] becomes ResourceEntry[] @@− std::vector<ResourceDesc> resources;+ std::vector<ResourceEntry> entries; // now with versioning
CreateResource / ImportResource now build ResourceEntry objects (with an empty initial version). The real work is in the three access methods:
v1 → v2, CreateResource / ImportResource updated
@@ frame_graph_v2.cpp, CreateResource / ImportResource now use ResourceEntry @@ResourceHandle FrameGraph::CreateResource(const ResourceDesc& desc){− resources.push_back(desc);− return {static_cast<ResourceIndex>(resources.size() - 1)};+ entries.push_back({desc, {{}}});+ return {static_cast<ResourceIndex>(entries.size() - 1)};}ResourceHandle FrameGraph::ImportResource(const ResourceDesc& desc){− resources.push_back(desc);− return {static_cast<ResourceIndex>(resources.size() - 1)};+ entries.push_back({desc, {{}}, /*imported=*/true});+ return {static_cast<ResourceIndex>(entries.size() - 1)};}
Read() looks up the current version’s writer and adds a RAW (read-after-write) dependency edge: “I need the result of whoever last wrote this.” It also registers itself as a reader so that a future Write() knows who to protect:
v1 → v2, Read()
@@ frame_graph_v2.cpp, Read() @@+// Read: look up who last wrote this resource → add a dependency edge from that writer to this pass.+void FrameGraph::Read(PassIndex passIdx, ResourceHandle h)+{+ auto& ver = entries[h.index].versions.back(); // current version+ if (ver.HasWriter())+ {+ passes[passIdx].dependsOn.push_back(ver.writerPass); // RAW edge+ }+ ver.readerPasses.push_back(passIdx); // track who reads this version+ passes[passIdx].reads.push_back(h); // record for barrier insertion+}
Write() handles the other direction: it adds a WAW (write-after-write) edge from the current version’s writer (if any) and WAR (write-after-read) edges from every reader of the current version (ensuring they all finish before the overwrite), then bumps the version so future reads see the new data:
v1 → v2, Write()
@@ frame_graph_v2.cpp, Write() @@+// Write: add WAW edge from prev writer + WAR edges from readers, then bump the version.+void FrameGraph::Write(PassIndex passIdx, ResourceHandle h)+{+ auto& ver = entries[h.index].versions.back(); // current version (pre-bump)+ if (ver.HasWriter())+ passes[passIdx].dependsOn.push_back(ver.writerPass); // WAW edge: prev writer must finish+ for (PassIndex reader : ver.readerPasses)+ passes[passIdx].dependsOn.push_back(reader); // WAR edge: reader must finish first+ entries[h.index].versions.push_back({}); // bump version+ entries[h.index].versions.back().writerPass = passIdx; // this pass owns the new version+ passes[passIdx].writes.push_back(h); // record for barrier insertion+}
ReadWrite() (UAV) combines both patterns: RAW edge from the previous writer, WAR edges from current readers, version bump, and it pushes the handle into all three lists (reads, writes, readWrites) so the barrier system can identify it as an unordered-access resource:
v1 → v2, ReadWrite() (UAV)
@@ frame_graph_v2.cpp, ReadWrite() (UAV) @@+// ReadWrite (UAV): depend on previous writer + WAR edges from readers, then bump version.+void FrameGraph::ReadWrite(PassIndex passIdx, ResourceHandle h)+{+ auto& ver = entries[h.index].versions.back();+ if (ver.HasWriter())+ {+ passes[passIdx].dependsOn.push_back(ver.writerPass); // RAW edge+ }+ for (PassIndex reader : ver.readerPasses)+ passes[passIdx].dependsOn.push_back(reader); // WAR edge+ entries[h.index].versions.push_back({}); // bump version (it's a write)+ entries[h.index].versions.back().writerPass = passIdx;+ passes[passIdx].reads.push_back(h); // appears in both lists (for barriers + lifetimes)+ passes[passIdx].writes.push_back(h);+ passes[passIdx].readWrites.push_back(h); // marks this handle as UAV for StateForUsage+}
Every Write() adds a WAW edge from the previous writer (if any) plus WAR edges from every reader of the current version (so they finish before the overwrite), then bumps the version. Every Read() finds the current version’s writer and records a RAW edge. Together they capture all three data hazards: read-after-write, write-after-read, and write-after-write. Those edges feed the next three steps.
Topological sort (Kahn’s algorithm)#
With edges in place, we need an execution order that respects every dependency. Kahn’s algorithm (theory refresher) gives us one in O(V+E). BuildEdges() deduplicates the raw dependsOn entries and builds the adjacency list. TopoSort() does the zero-in-degree queue drain:
v2, Edge deduplication (BuildEdges)
@@ frame_graph_v2.h, RenderPass gets successors + inDegree (for Kahn's) @@struct RenderPass{...+ std::vector<PassIndex> successors; // passes that depend on this one+ uint32_t inDegree = 0; // incoming edge count (Kahn's)};@@ frame_graph_v2.cpp, BuildEdges() @@+// Deduplicate raw dependsOn edges and build forward adjacency list (successors) for Kahn's algorithm.+void FrameGraph::BuildEdges()+{+ for (PassIndex i = 0; i < passes.size(); i++)+ {+ std::unordered_set<PassIndex> seen;+ for (PassIndex dep : passes[i].dependsOn)+ {+ if (seen.insert(dep).second) // first time seeing this edge?+ {+ passes[dep].successors.push_back(i); // forward link: dep → i+ passes[i].inDegree++; // i has one more incoming edge+ }+ }+ }+}
With the adjacency list built, TopoSort() implements Kahn’s zero-in-degree queue drain: any pass whose dependencies are all satisfied gets dequeued next:
v2, Kahn's topological sort
@@ frame_graph_v2.cpp, TopoSort() @@+// Kahn's algorithm: dequeue zero-in-degree passes → valid execution order respecting all dependencies.+std::vector<PassIndex> FrameGraph::TopoSort()+{+ std::queue<PassIndex> q;+ std::vector<uint32_t> inDeg(passes.size());+ for (PassIndex i = 0; i < passes.size(); i++)+ {+ inDeg[i] = passes[i].inDegree;+ if (inDeg[i] == 0)+ q.push(i); // no dependencies → ready immediately+ }+ std::vector<PassIndex> order;+ while (!q.empty())+ {+ PassIndex cur = q.front();+ q.pop();+ order.push_back(cur);+ for (PassIndex succ : passes[cur].successors)+ {+ if (--inDeg[succ] == 0) // all of succ's dependencies done?+ q.push(succ); // succ is now ready+ }+ }+ // If we didn't visit every pass, the graph has a cycle, invalid.+ assert(order.size() == passes.size() && "Cycle detected!");+ return order;+}
Pass culling#
A sorted graph still runs passes nobody reads from. Culling is dead-code elimination for GPU work (theory refresher), using a single backward walk that marks the final pass alive, then propagates through dependsOn edges:
v2, Pass culling
@@ frame_graph_v2.h, RenderPass gets alive flag @@struct RenderPass{...+ bool alive = false; // survives the cull?};@@ frame_graph_v2.cpp, Cull() @@+// Dead-code elimination: walk backward from the final output pass, marking dependencies alive.+void FrameGraph::Cull(const std::vector<PassIndex>& sorted)+{+ if (sorted.empty())+ return;+ passes[sorted.back()].alive = true; // last pass = the final output (e.g. Present)+ for (int i = static_cast<int>(sorted.size()) - 1; i >= 0; i--)+ {+ if (!passes[sorted[i]].alive)+ continue; // skip dead passes+ for (PassIndex dep : passes[sorted[i]].dependsOn)+ passes[dep].alive = true; // my dependency is needed → keep it alive+ }+}
Barrier insertion#
GPUs need explicit state transitions between resource usages: color attachment to shader read, undefined to depth, etc. The graph already knows every resource’s read/write history (theory refresher), so the compiler can figure out every transition before execution starts.
The idea: walk the sorted pass list, compare each resource’s tracked state to what the pass needs, and record a barrier when they differ. This is where we introduce the compile / execute split: Compile() precomputes every transition into a CompiledPlan, and Execute() replays them. No state tracking at execution time, no decisions, just playback.
First, the new types. ResourceState is an enum with six values (Undefined, ColorAttachment, DepthAttachment, ShaderRead, UnorderedAccess, Present). Barrier pairs a resource index with old/new state. ResourceEntry gains a currentState field, and ImportResource takes an initial state so the swapchain can start as Present:
v2, ResourceState, Barrier & ResourceEntry changes
@@ frame_graph_v2.h, ResourceState enum + Barrier struct @@+enum class ResourceState+{+ Undefined,+ ColorAttachment,+ DepthAttachment,+ ShaderRead,+ UnorderedAccess,+ Present+};++// A single resource-state transition.+struct Barrier+{+ ResourceIndex resourceIndex; // which resource to transition+ ResourceState oldState; // state before this pass+ ResourceState newState; // state this pass requires+};@@ frame_graph_v2.h, ResourceEntry gets currentState @@struct ResourceEntry{...+ ResourceState currentState = ResourceState::Undefined;};@@ frame_graph_v2.h, ImportResource() accepts initial state @@− ResourceHandle ImportResource(const ResourceDesc& desc);+ ResourceHandle ImportResource(+ const ResourceDesc& desc,+ ResourceState initialState = ResourceState::Undefined);
With those types in place, we introduce the compile / execute split. CompiledPlan holds the topological order and a 2D barrier array (barriers[orderIdx] = transitions before that pass). Compile() returns a plan. Execute() replays it. CreateResource / ImportResource gain a fourth field for initial state:
v2, CompiledPlan, Compile/Execute split & updated constructors
@@ frame_graph_v2.h, CompiledPlan + Compile/Execute split @@+ struct CompiledPlan+ {+ std::vector<PassIndex> sorted; // topological execution order+ std::vector<std::vector<Barrier>> barriers; // barriers[orderIdx] → pre-pass transitions+ };++ CompiledPlan Compile();+ void Execute(const CompiledPlan& plan);void Execute(); // convenience: compile + execute in one call@@ frame_graph_v2.cpp, CreateResource / ImportResource updated for ResourceState @@ResourceHandle FrameGraph::CreateResource(const ResourceDesc& desc){− entries.push_back({desc, {{}}});+ entries.push_back({desc, {{}}, ResourceState::Undefined, false});return {static_cast<ResourceIndex>(entries.size() - 1)};}−ResourceHandle FrameGraph::ImportResource(const ResourceDesc& desc)+ResourceHandle FrameGraph::ImportResource(const ResourceDesc& desc, ResourceState initialState){− entries.push_back({desc, {{}}, /*imported=*/true});+ entries.push_back({desc, {{}}, initialState, true});return {static_cast<ResourceIndex>(entries.size() - 1)};}
With the type system in place, ComputeBarriers() walks the sorted pass list. For each surviving pass it first infers the required state for every resource the pass touches. IsUAV checks the readWrites list. StateForUsage maps usage to one of the six states (ShaderRead for reads, ColorAttachment or DepthAttachment for writes based on format, UnorderedAccess for UAVs):
v2, ComputeBarriers(), state inference helpers
@@ frame_graph_v2.cpp, ComputeBarriers() state inference @@+// Walk sorted passes, compare tracked state to each resource's needed state, record transitions.+std::vector<std::vector<Barrier>> FrameGraph::ComputeBarriers(const std::vector<PassIndex>& sorted)+{+ std::vector<std::vector<Barrier>> result(sorted.size());+ for (PassIndex orderIdx = 0; orderIdx < sorted.size(); orderIdx++)+ {+ PassIndex passIdx = sorted[orderIdx];+ if (!passes[passIdx].alive)+ continue;++ auto IsUAV = [&](ResourceHandle h)+ {+ for (auto& rw : passes[passIdx].readWrites)+ if (rw.index == h.index)+ return true;+ return false;+ };+ auto StateForUsage = [&](ResourceHandle h, bool isWrite)+ {+ if (IsUAV(h))+ return ResourceState::UnorderedAccess;+ if (isWrite)+ return (entries[h.index].desc.format == Format::D32F) ? ResourceState::DepthAttachment+ : ResourceState::ColorAttachment;+ return ResourceState::ShaderRead;+ };
With the required state known, recordTransition compares it to the resource’s tracked currentState. When they differ it records a Barrier (resource index + old/new state) and updates the tracker. Two loops fire it (once for reads, once for writes), producing a 2D vector where barriers[orderIdx] holds every transition that must fire before that pass runs:
v2, ComputeBarriers(), record transitions
@@ frame_graph_v2.cpp, ComputeBarriers() transition recording @@+ auto recordTransition = [&](ResourceHandle h, bool isWrite)+ {+ ResourceState needed = StateForUsage(h, isWrite);+ if (entries[h.index].currentState != needed)+ {+ result[orderIdx].push_back({h.index, entries[h.index].currentState, needed});+ entries[h.index].currentState = needed;+ }+ };+ for (auto& h : passes[passIdx].reads)+ recordTransition(h, false);+ for (auto& h : passes[passIdx].writes)+ recordTransition(h, true);+ }+ return result;+}
EmitBarriers() replays those transitions on the GPU. In production this maps to vkCmdPipelineBarrier (Vulkan) or ID3D12GraphicsCommandList::ResourceBarrier (D3D12). For our MVP it’s a one-liner stub:
v2, EmitBarriers()
@@ frame_graph_v2.cpp, EmitBarriers() (replay) @@+// Replay precomputed transitions, in production this calls the GPU API.+void FrameGraph::EmitBarriers(const std::vector<Barrier>& barriers)+{+ for (auto& b : barriers)+ {+ // vkCmdPipelineBarrier / ResourceBarrier+ }+}
Compile() chains all four stages (edges, sort, cull, barriers) into a CompiledPlan. Execute() walks the plan in order: emit precomputed barriers, call the pass’s execute lambda, repeat. A convenience overload does both in one call:
v2, Compile() & Execute()
@@ frame_graph_v2.cpp, Compile() + Execute() @@+// Full compile pipeline: sort → cull → precompute barriers. Returns a self-contained plan.+FrameGraph::CompiledPlan FrameGraph::Compile()+{+ BuildEdges();+ auto sorted = TopoSort();+ Cull(sorted);+ auto barriers = ComputeBarriers(sorted);+ return {std::move(sorted), std::move(barriers)};+}++// Pure playback, emit precomputed barriers, call execute lambdas. No analysis.+void FrameGraph::Execute(const CompiledPlan& plan)+{+ for (PassIndex orderIdx = 0; orderIdx < plan.sorted.size(); orderIdx++)+ {+ PassIndex passIdx = plan.sorted[orderIdx];+ if (!passes[passIdx].alive)+ continue;+ EmitBarriers(plan.barriers[orderIdx]);+ passes[passIdx].Execute(/* &cmdList */);+ }+ passes.clear();+ entries.clear();+}++void FrameGraph::Execute()+{+ Execute(Compile());+}
All four pieces (versioning, sorting, culling, barriers) compose into Compile(). Each step feeds the next: versioning creates edges, edges feed the sort, the sort enables culling, and the surviving sorted passes get precomputed barriers. Execute() is pure playback.
Full v2 source#
printf diagnostics (topo-sort order, culling results, barrier transitions) that are omitted from the diffs above to keep the focus on structure. These diagnostics are invaluable for debugging. Read through them in the source.#pragma once
// Frame Graph MVP v2 -- Dependencies & Barriers
// Adds: resource versioning, DAG with adjacency list, Kahn's topo-sort,
// pass culling, and automatic barrier insertion.
//
// Compile: clang++ -std=c++17 -o example_v2 example_v2.cpp
// or: g++ -std=c++17 -o example_v2 example_v2.cpp
#include <cstdint>
#include <functional>
#include <string>
#include <vector>
// == Resource description (virtual until compile) ==============
enum class Format
{
RGBA8,
RGBA16F,
R8,
D32F
};
struct ResourceDesc
{
uint32_t width = 0;
uint32_t height = 0;
Format format = Format::RGBA8;
};
using ResourceIndex = uint32_t;
using PassIndex = uint32_t;
struct ResourceHandle
{
ResourceIndex index = UINT32_MAX;
bool IsValid() const { return index != UINT32_MAX; }
};
// == Resource state tracking ====================================
enum class ResourceState
{
Undefined,
ColorAttachment,
DepthAttachment,
ShaderRead,
UnorderedAccess,
Present
};
inline const char* StateName(ResourceState s)
{
switch (s)
{
case ResourceState::Undefined:
return "Undefined";
case ResourceState::ColorAttachment:
return "ColorAttachment";
case ResourceState::DepthAttachment:
return "DepthAttachment";
case ResourceState::ShaderRead:
return "ShaderRead";
case ResourceState::UnorderedAccess:
return "UnorderedAccess";
case ResourceState::Present:
return "Present";
default:
return "?";
}
}
struct ResourceVersion
{
PassIndex writerPass = UINT32_MAX; // which pass wrote this version
std::vector<PassIndex> readerPasses; // which passes read it
bool HasWriter() const { return writerPass != UINT32_MAX; }
};
// A single resource-state transition.
struct Barrier
{
ResourceIndex resourceIndex;
ResourceState oldState;
ResourceState newState;
};
// Extend ResourceDesc with tracking:
struct ResourceEntry
{
ResourceDesc desc;
std::vector<ResourceVersion> versions; // version 0, 1, 2...
ResourceState currentState = ResourceState::Undefined;
bool imported = false; // imported resources are not owned by the graph
};
// == Updated render pass =======================================
struct RenderPass
{
std::string name;
std::function<void()> Setup;
std::function<void(/*cmd list*/)> Execute;
std::vector<ResourceHandle> reads;
std::vector<ResourceHandle> writes;
std::vector<ResourceHandle> readWrites; // UAV (explicit)
std::vector<PassIndex> dependsOn; // passes this pass depends on
std::vector<PassIndex> successors; // passes that depend on this pass
uint32_t inDegree = 0; // for Kahn's
bool alive = false; // for culling
};
// == Updated FrameGraph ========================================
class FrameGraph
{
public:
ResourceHandle CreateResource(const ResourceDesc& desc);
// Import an external resource (e.g. swapchain backbuffer).
// The graph tracks barriers but does not own or alias its memory.
ResourceHandle ImportResource(const ResourceDesc& desc, ResourceState initialState = ResourceState::Undefined);
// Declare a read -- links this pass to the resource's current version.
void Read(PassIndex passIdx, ResourceHandle h);
// Declare a write -- creates a new version of the resource.
void Write(PassIndex passIdx, ResourceHandle h);
// Declare a UAV access -- read + write on the same version.
void ReadWrite(PassIndex passIdx, ResourceHandle h);
template <typename SetupFn, typename ExecFn> void AddPass(const std::string& name, SetupFn&& setup, ExecFn&& exec)
{
passes.push_back({name, std::forward<SetupFn>(setup), std::forward<ExecFn>(exec)});
passes.back().Setup();
}
// == v2: compile — precompute sort, cull, barriers =========
struct CompiledPlan
{
std::vector<PassIndex> sorted; // topological execution order
std::vector<std::vector<Barrier>> barriers; // barriers[orderIdx] → pre-pass transitions
};
CompiledPlan Compile();
void Execute(const CompiledPlan& plan);
// convenience: compile + execute in one call
void Execute();
private:
std::vector<RenderPass> passes;
std::vector<ResourceEntry> entries;
void BuildEdges();
std::vector<PassIndex> TopoSort();
void Cull(const std::vector<PassIndex>& sorted);
std::vector<std::vector<Barrier>> ComputeBarriers(const std::vector<PassIndex>& sorted);
void EmitBarriers(const std::vector<Barrier>& barriers);
};
#include "frame_graph_v2.h"
#include <algorithm>
#include <cassert>
#include <cstdio>
#include <queue>
#include <unordered_set>
// == FrameGraph implementation =================================
ResourceHandle FrameGraph::CreateResource(const ResourceDesc& desc)
{
entries.push_back({desc, {{}}, ResourceState::Undefined, false});
return {static_cast<ResourceIndex>(entries.size() - 1)};
}
ResourceHandle FrameGraph::ImportResource(const ResourceDesc& desc, ResourceState initialState)
{
entries.push_back({desc, {{}}, initialState, true});
return {static_cast<ResourceIndex>(entries.size() - 1)};
}
void FrameGraph::Read(PassIndex passIdx, ResourceHandle h)
{
auto& ver = entries[h.index].versions.back();
if (ver.HasWriter())
{
passes[passIdx].dependsOn.push_back(ver.writerPass);
}
ver.readerPasses.push_back(passIdx);
passes[passIdx].reads.push_back(h);
}
void FrameGraph::Write(PassIndex passIdx, ResourceHandle h)
{
auto& ver = entries[h.index].versions.back(); // current version (pre-bump)
if (ver.HasWriter())
passes[passIdx].dependsOn.push_back(ver.writerPass); // WAW edge: prev writer must finish
for (PassIndex reader : ver.readerPasses)
passes[passIdx].dependsOn.push_back(reader); // WAR edge: reader must finish first
entries[h.index].versions.push_back({});
entries[h.index].versions.back().writerPass = passIdx;
passes[passIdx].writes.push_back(h);
}
void FrameGraph::ReadWrite(PassIndex passIdx, ResourceHandle h)
{
auto& ver = entries[h.index].versions.back();
if (ver.HasWriter())
{
passes[passIdx].dependsOn.push_back(ver.writerPass); // RAW edge
}
for (PassIndex reader : ver.readerPasses)
passes[passIdx].dependsOn.push_back(reader); // WAR edge
entries[h.index].versions.push_back({});
entries[h.index].versions.back().writerPass = passIdx;
passes[passIdx].reads.push_back(h);
passes[passIdx].writes.push_back(h);
passes[passIdx].readWrites.push_back(h);
}
// == v2: compile — precompute sort, cull, barriers ============
FrameGraph::CompiledPlan FrameGraph::Compile()
{
printf("\n[1] Building dependency edges...\n");
BuildEdges();
printf("[2] Topological sort...\n");
auto sorted = TopoSort();
printf("[3] Culling dead passes...\n");
Cull(sorted);
printf("[4] Computing barriers...\n");
auto barriers = ComputeBarriers(sorted);
return {std::move(sorted), std::move(barriers)};
}
// == Execute (replay compiled plan) ===========================
void FrameGraph::Execute(const CompiledPlan& plan)
{
printf("[5] Executing (replaying precomputed barriers):\n");
for (PassIndex orderIdx = 0; orderIdx < plan.sorted.size(); orderIdx++)
{
PassIndex passIdx = plan.sorted[orderIdx];
if (!passes[passIdx].alive)
{
printf(" -- skip: %s (CULLED)\n", passes[passIdx].name.c_str());
continue;
}
EmitBarriers(plan.barriers[orderIdx]);
passes[passIdx].Execute(/* &cmdList */);
}
passes.clear();
entries.clear();
}
// convenience: compile + execute in one call
void FrameGraph::Execute()
{
Execute(Compile());
}
// == Build dependency edges ====================================
void FrameGraph::BuildEdges()
{
for (PassIndex i = 0; i < passes.size(); i++)
{
// Deduplicate dependency edges and build successor list.
std::unordered_set<PassIndex> seen;
for (PassIndex dep : passes[i].dependsOn)
{
if (seen.insert(dep).second)
{
passes[dep].successors.push_back(i);
passes[i].inDegree++;
}
}
}
}
// == Kahn's topological sort -- O(V + E) ========================
std::vector<PassIndex> FrameGraph::TopoSort()
{
std::queue<PassIndex> q;
std::vector<uint32_t> inDeg(passes.size());
for (PassIndex i = 0; i < passes.size(); i++)
{
inDeg[i] = passes[i].inDegree;
if (inDeg[i] == 0)
q.push(i);
}
std::vector<PassIndex> order;
while (!q.empty())
{
PassIndex cur = q.front();
q.pop();
order.push_back(cur);
// Walk the adjacency list -- O(E) total across all nodes.
for (PassIndex succ : passes[cur].successors)
{
if (--inDeg[succ] == 0)
q.push(succ);
}
}
assert(order.size() == passes.size() && "Cycle detected!");
printf(" Topological order: ");
for (PassIndex i = 0; i < order.size(); i++)
{
printf("%s%s", passes[order[i]].name.c_str(), i + 1 < order.size() ? " -> " : "\n");
}
return order;
}
// == Cull dead passes (backward walk from output) ==============
void FrameGraph::Cull(const std::vector<PassIndex>& sorted)
{
// Mark the last pass (present) as alive, then walk backward.
if (sorted.empty())
return;
passes[sorted.back()].alive = true;
for (int i = static_cast<int>(sorted.size()) - 1; i >= 0; i--)
{
if (!passes[sorted[i]].alive)
continue;
for (PassIndex dep : passes[sorted[i]].dependsOn)
passes[dep].alive = true;
}
printf(" Culling result: ");
for (PassIndex i = 0; i < passes.size(); i++)
{
printf("%s=%s%s", passes[i].name.c_str(), passes[i].alive ? "ALIVE" : "DEAD", i + 1 < passes.size() ? ", " : "\n");
}
}
// == Precompute barriers (state transitions) ==================
std::vector<std::vector<Barrier>> FrameGraph::ComputeBarriers(const std::vector<PassIndex>& sorted)
{
std::vector<std::vector<Barrier>> result(sorted.size());
for (PassIndex orderIdx = 0; orderIdx < sorted.size(); orderIdx++)
{
PassIndex passIdx = sorted[orderIdx];
if (!passes[passIdx].alive)
continue;
auto IsUAV = [&](ResourceHandle h)
{
for (auto& rw : passes[passIdx].readWrites)
if (rw.index == h.index)
return true;
return false;
};
auto StateForUsage = [&](ResourceHandle h, bool isWrite)
{
if (IsUAV(h))
return ResourceState::UnorderedAccess;
if (isWrite)
return (entries[h.index].desc.format == Format::D32F) ? ResourceState::DepthAttachment : ResourceState::ColorAttachment;
return ResourceState::ShaderRead;
};
auto recordTransition = [&](ResourceHandle h, bool isWrite)
{
ResourceState needed = StateForUsage(h, isWrite);
if (entries[h.index].currentState != needed)
{
result[orderIdx].push_back({h.index, entries[h.index].currentState, needed});
entries[h.index].currentState = needed;
}
};
for (auto& h : passes[passIdx].reads)
recordTransition(h, false);
for (auto& h : passes[passIdx].writes)
recordTransition(h, true);
}
uint32_t total = 0;
for (auto& v : result)
total += static_cast<uint32_t>(v.size());
printf(" Barriers computed: %u transition(s) across %u passes\n", total, static_cast<uint32_t>(sorted.size()));
return result;
}
// == Emit barriers (replay precomputed transitions) ===========
void FrameGraph::EmitBarriers(const std::vector<Barrier>& barriers)
{
for (auto& b : barriers)
{
printf(" barrier: resource[%u] %s -> %s\n", b.resourceIndex, StateName(b.oldState), StateName(b.newState));
}
}
// Frame Graph MVP v2 -- Usage Example
// Compile: clang++ -std=c++17 -o example_v2 example_v2.cpp
#include "frame_graph_v2.h"
#include "frame_graph_v2.cpp" // single-TU build (Godbolt)
#include <cstdio>
int main()
{
printf("=== Frame Graph v2: Dependencies & Barriers ===\n");
FrameGraph fg;
// Import the swapchain backbuffer — externally owned.
auto backbuffer = fg.ImportResource({1920, 1080, Format::RGBA8}, ResourceState::Present);
auto depth = fg.CreateResource({1920, 1080, Format::D32F});
auto gbufA = fg.CreateResource({1920, 1080, Format::RGBA8});
auto hdr = fg.CreateResource({1920, 1080, Format::RGBA16F});
auto debug = fg.CreateResource({1920, 1080, Format::RGBA8});
fg.AddPass(
"DepthPrepass",
[&]()
{
fg.Write(0, depth);
},
[&](/*cmd*/)
{
printf(" >> exec: DepthPrepass\n");
});
fg.AddPass(
"GBuffer",
[&]()
{
fg.Read(1, depth);
fg.Write(1, gbufA);
},
[&](/*cmd*/)
{
printf(" >> exec: GBuffer\n");
});
fg.AddPass(
"Lighting",
[&]()
{
fg.Read(2, gbufA);
fg.Write(2, hdr);
},
[&](/*cmd*/)
{
printf(" >> exec: Lighting\n");
});
// Compute pass — explicit UAV access on hdr.
fg.AddPass(
"SSR",
[&]()
{
fg.ReadWrite(3, hdr);
},
[&](/*cmd*/)
{
printf(" >> exec: SSR (compute, UAV)\n");
});
// Present — writes to the imported backbuffer.
fg.AddPass(
"Present",
[&]()
{
fg.Read(4, hdr);
fg.Write(4, backbuffer);
},
[&](/*cmd*/)
{
printf(" >> exec: Present\n");
});
// Dead pass — nothing reads debug, so the graph will cull it.
fg.AddPass(
"DebugOverlay",
[&]()
{
fg.Write(5, debug);
},
[&](/*cmd*/)
{
printf(" >> exec: DebugOverlay\n");
});
fg.Execute();
return 0;
}
That’s three of the four intro promises delivered (automatic ordering, barrier insertion, and dead-pass culling), plus a clean compile/execute split that v3 will extend. The only piece missing: resources still live for the entire frame. Version 3 fixes that with lifetime analysis and memory aliasing.
MVP v3: Lifetimes & Aliasing#
V2 gives us ordering, culling, and barriers, but every transient resource still owns its own VRAM for the entire frame, even when it’s only alive for two passes out of twelve. A 1080p G-Buffer alone eats ~32 MB, and the depth target another ~8 MB, and both are dead after lighting. Meanwhile the post-process chain needs similarly-sized targets that could reuse that exact same memory, because the lifetimes never overlap (theory refresher). That’s what v3 automates.
The implementation adds two data structures (Lifetime, tracking first/last sorted-pass index per resource, and PhysicalBlock, a reusable heap slot) and extends Barrier with aliasing context. First, the lifetime scan. It walks the sorted pass list and records when each transient resource is first touched and last touched:
v3, New data structures: lifetime tracking & aliasing barriers
@@ frame_graph_v3.h, PhysicalBlock, Lifetime structs @@+// A physical memory slot, multiple virtual resources can reuse it if their lifetimes don't overlap.+struct PhysicalBlock+{+ uint32_t sizeBytes = 0; // block size (aligned)+ PassIndex availAfter = 0; // free after this sorted pass+};++// Per-resource lifetime in sorted-pass indices, drives aliasing decisions.+struct Lifetime+{+ PassIndex firstUse = UINT32_MAX; // first sorted pass that touches this resource+ PassIndex lastUse = 0; // last sorted pass that touches this resource+ bool isTransient = true; // false for imported resources (externally owned)+};+@@ frame_graph_v3.h, Barrier extended with aliasing fields @@// Base Barrier already defined in v2, v3 adds aliasing context.struct Barrier{ResourceIndex resourceIndex;ResourceState oldState;ResourceState newState;+ bool isAliasing = false; // aliasing barrier (block changes occupant)+ ResourceIndex aliasBefore = UINT32_MAX; // resource being evicted};
To alias at the heap level, we need to know sizes. AllocSize() computes an aligned allocation size per resource: width × height × BytesPerPixel, rounded up to 64 KB (the same placement alignment real GPUs enforce). The diff adds AllocSize(), AlignUp(), and BytesPerPixel():
v3, Allocation helpers (AllocSize, alignment)
@@ frame_graph_v3.h, Allocation helpers @@+// Minimum placement alignment for aliased heap resources (real APIs enforce similar, e.g. 64 KB).+static constexpr uint32_t kPlacementAlignment = 65536; // 64 KB++inline uint32_t AlignUp(uint32_t value, uint32_t alignment)+{+ return (value + alignment - 1) & ~(alignment - 1);+}++inline uint32_t BytesPerPixel(Format fmt)+{+ switch (fmt)+ {+ case Format::R8:+ return 1;+ case Format::RGBA8:+ return 4;+ case Format::D32F:+ return 4;+ case Format::RGBA16F:+ return 8;+ default:+ return 4;+ }+}++// Aligned allocation size, real drivers add row padding/tiling; we approximate with a round-up.+inline uint32_t AllocSize(const ResourceDesc& desc)+{+ uint32_t raw = desc.width * desc.height * BytesPerPixel(desc.format);+ return AlignUp(raw, kPlacementAlignment);+}
ScanLifetimes() walks the sorted pass list and records each transient resource’s first and last use (as sorted-pass indices). Imported resources are excluded. The resulting intervals drive the aliasing allocator, and non-overlapping intervals can share one physical block:
v3, ScanLifetimes()
@@ frame_graph_v3.cpp, ScanLifetimes() @@+// Record each resource's first/last use in sorted order, non-overlapping intervals can share memory.+std::vector<Lifetime> FrameGraph::ScanLifetimes(const std::vector<PassIndex>& sorted)+{+ std::vector<Lifetime> life(entries.size());++ // Imported resources (e.g. swapchain) are externally owned, exclude from aliasing.+ for (ResourceIndex i = 0; i < entries.size(); i++)+ {+ if (entries[i].imported)+ life[i].isTransient = false;+ }++ // Update first/last use for every resource each surviving pass touches.+ for (PassIndex order = 0; order < sorted.size(); order++)+ {+ PassIndex passIdx = sorted[order];+ if (!passes[passIdx].alive)+ continue;++ for (auto& h : passes[passIdx].reads)+ {+ life[h.index].firstUse = std::min(life[h.index].firstUse, order);+ life[h.index].lastUse = std::max(life[h.index].lastUse, order);+ }+ for (auto& h : passes[passIdx].writes)+ {+ life[h.index].firstUse = std::min(life[h.index].firstUse, order);+ life[h.index].lastUse = std::max(life[h.index].lastUse, order);+ }+ }+ return life;+}
This requires placed resources at the API level: GPU memory allocated from a heap, with resources bound to offsets within it. In D3D12, that means ID3D12Heap + CreatePlacedResource. In Vulkan, VkDeviceMemory + vkBindImageMemory at different offsets. Without placed resources (i.e. CreateCommittedResource or Vulkan dedicated allocations), each resource gets its own memory and aliasing is impossible, which is exactly why the graph’s allocator needs to work with heaps.
With lifetimes in hand, the greedy free-list allocator is straightforward. Sort resources by firstUse, walk them in order, and for each one either reuse an existing physical block whose previous occupant has finished, or allocate a new one. CompiledPlan gains a mapping vector (virtual resource to physical block), and ComputeBarriers() gains a Phase 1 that emits aliasing barriers whenever a block changes occupant:
v3, Header changes (BlockIndex, CompiledPlan, new methods)
@@ frame_graph_v3.h, BlockIndex typedef @@+using BlockIndex = uint32_t; // index into the physical-block free list@@ frame_graph_v3.h, CompiledPlan extended with mapping @@struct CompiledPlan{std::vector<PassIndex> sorted;+ std::vector<BlockIndex> mapping; // mapping[ResourceIndex] → physical blockstd::vector<std::vector<Barrier>> barriers;};@@ frame_graph_v3.h, new private methods @@+ std::vector<Lifetime> ScanLifetimes(const std::vector<PassIndex>& sorted);+ std::vector<BlockIndex> AliasResources(const std::vector<Lifetime>& lifetimes);+ ResourceState StateForUsage(PassIndex passIdx, ResourceHandle h, bool isWrite) const;− std::vector<std::vector<Barrier>> ComputeBarriers(const std::vector<PassIndex>& sorted);+ std::vector<std::vector<Barrier>> ComputeBarriers(const std::vector<PassIndex>& sorted,+ const std::vector<BlockIndex>& mapping);
The greedy free-list allocator implements the aliasing strategy. First it builds a firstUse-sorted index so resources are processed in the order they appear in the frame:
v3, AliasResources() setup & sorting
@@ frame_graph_v3.cpp, AliasResources() setup @@+// Greedy first-fit: sort by firstUse, reuse any free block that fits, else allocate a new one.+std::vector<BlockIndex> FrameGraph::AliasResources(const std::vector<Lifetime>& lifetimes)+{+ std::vector<PhysicalBlock> freeList;+ std::vector<BlockIndex> mapping(entries.size(), UINT32_MAX);++ // Process resources in the order they're first used.+ std::vector<ResourceIndex> indices(entries.size());+ std::iota(indices.begin(), indices.end(), 0);+ std::sort(+ indices.begin(),+ indices.end(),+ [&](ResourceIndex a, ResourceIndex b)+ {+ return lifetimes[a].firstUse < lifetimes[b].firstUse;+ });
For each transient resource, scan existing physical blocks for one that is both free (previous occupant’s lastUse < this resource’s firstUse) and large enough. If found, reuse it. Otherwise, allocate a new block. The result is a mapping vector where mapping[ResourceIndex] gives the physical block index:
v3, AliasResources() allocation loop
@@ frame_graph_v3.cpp, AliasResources() allocation loop @@+ for (ResourceIndex resIdx : indices)+ {+ if (!lifetimes[resIdx].isTransient)+ continue; // skip imported resources+ if (lifetimes[resIdx].firstUse == UINT32_MAX)+ continue; // never used++ uint32_t needed = AllocSize(entries[resIdx].desc);+ bool reused = false;++ // Scan existing blocks, can we reuse one that's now free?+ for (BlockIndex b = 0; b < freeList.size(); b++)+ {+ if (freeList[b].availAfter < lifetimes[resIdx].firstUse+ && freeList[b].sizeBytes >= needed)+ {+ mapping[resIdx] = b; // reuse this block+ freeList[b].availAfter = lifetimes[resIdx].lastUse; // extend occupancy+ reused = true;+ break;+ }+ }++ if (!reused)+ { // no fit found → allocate a new physical block+ mapping[resIdx] = static_cast<BlockIndex>(freeList.size());+ freeList.push_back({needed, lifetimes[resIdx].lastUse});+ }+ }+ return mapping;+}
Compile() chains all stages: build edges → topo-sort → cull → scan lifetimes → alias → compute barriers. The diff shows the updated Compile() body and the new StateForUsage() method (extracted from v2’s inline lambda so both phases can reuse it). StateForUsage maps a resource handle + read/write flag to a ResourceState: UAV handles get UnorderedAccess, depth-format writes get DepthAttachment, other writes get ColorAttachment, reads get ShaderRead:
v3, Updated Compile() & state inference
@@ frame_graph_v3.cpp, Compile() extended with lifetime analysis + aliasing @@FrameGraph::CompiledPlan FrameGraph::Compile(){BuildEdges();auto sorted = TopoSort();Cull(sorted);+ auto lifetimes = ScanLifetimes(sorted); // when is each resource alive?+ auto mapping = AliasResources(lifetimes); // share memory where lifetimes don't overlap− auto barriers = ComputeBarriers(sorted);+ auto barriers = ComputeBarriers(sorted, mapping); // extended: also emits aliasing transitions− return { std::move(sorted), std::move(barriers) };+ return { std::move(sorted), std::move(mapping), std::move(barriers) };}@@ frame_graph_v3.cpp, StateForUsage() extracted as class method @@+// Infer the ResourceState a pass needs for a given resource handle.+ResourceState FrameGraph::StateForUsage(PassIndex passIdx, ResourceHandle h, bool isWrite) const+{+ for (auto& rw : passes[passIdx].readWrites)+ if (rw.index == h.index)+ return ResourceState::UnorderedAccess;+ if (isWrite)+ return (entries[h.index].desc.format == Format::D32F)+ ? ResourceState::DepthAttachment : ResourceState::ColorAttachment;+ return ResourceState::ShaderRead;+}
ComputeBarriers() now runs two phases per pass. The function signature gains a mapping parameter (from AliasResources), and a blockOwner vector tracks which virtual resource currently occupies each physical block. First, the setup and handle deduplication (UAVs appear in both reads and writes, so we collect unique handles once):
v3, ComputeBarriers() setup & handle dedup
@@ frame_graph_v3.cpp, ComputeBarriers() rewritten with aliasing @@+// v3 ComputeBarriers: two phases per pass instead of v2's one.+// Phase 1, aliasing: did this physical block change occupant? If so, emit an aliasing barrier.+// Phase 2, state: did this resource's state change? If so, emit a state-transition barrier.std::vector<std::vector<Barrier>> FrameGraph::ComputeBarriers(const std::vector<PassIndex>& sorted,const std::vector<BlockIndex>& mapping){std::vector<std::vector<Barrier>> result(sorted.size());+ // blockOwner[block] = which virtual resource currently occupies it.+ std::vector<ResourceIndex> blockOwner;+ {+ BlockIndex maxBlock = 0;+ for (auto m : mapping)+ if (m != UINT32_MAX)+ maxBlock = std::max(maxBlock, m + 1);+ blockOwner.assign(maxBlock, UINT32_MAX);+ }for (PassIndex orderIdx = 0; orderIdx < sorted.size(); orderIdx++){PassIndex passIdx = sorted[orderIdx];if (!passes[passIdx].alive)continue;+ // ── Collect unique handles (ReadWrite puts h in both reads & writes) ──+ std::vector<std::pair<ResourceHandle, bool>> unique; // {handle, isWrite}+ std::unordered_set<ResourceIndex> seen;+ for (auto& h : passes[passIdx].reads)+ if (seen.insert(h.index).second)+ unique.push_back({h, false});+ for (auto& h : passes[passIdx].writes)+ {+ if (seen.insert(h.index).second)+ {+ unique.push_back({h, true});+ }+ else+ {+ // already in reads, upgrade to write (UAV)+ for (auto& [uh, w] : unique)+ if (uh.index == h.index)+ {+ w = true;+ break;+ }+ }+ }
Phase 1 (aliasing): for each unique handle, check if its physical block was previously occupied by a different resource, and if so, emit an aliasing barrier so the GPU flushes caches. Phase 2 (state transitions): same as v2: compare tracked state to needed state, emit a transition when they differ:
v3, ComputeBarriers() Phase 1 & 2
@@ frame_graph_v3.cpp, Phase 1 (aliasing) + Phase 2 (state transitions) @@+ // ── Phase 1: aliasing barriers ──────────────────────────+ for (auto& [h, _] : unique)+ {+ BlockIndex block = mapping[h.index];+ if (block == UINT32_MAX)+ continue; // imported, no aliasing+ if (blockOwner[block] != UINT32_MAX && blockOwner[block] != h.index)+ {+ result[orderIdx].push_back(+ {h.index, ResourceState::Undefined, ResourceState::Undefined, true, blockOwner[block]});+ }+ blockOwner[block] = h.index; // update current occupant+ }++ // ── Phase 2: state-transition barriers (same as v2) ────+ for (auto& [h, isWrite] : unique)+ {+ ResourceState needed = StateForUsage(passIdx, h, isWrite);+ if (entries[h.index].currentState != needed)+ {+ result[orderIdx].push_back(+ {h.index, entries[h.index].currentState, needed});+ entries[h.index].currentState = needed;+ }+ }}return result;}
EmitBarriers() grows a branch to dispatch the two barrier types: aliasing transitions go to the heap-level API, state transitions go to the per-resource API:
v3, EmitBarriers() with aliasing dispatch
@@ frame_graph_v3.cpp, EmitBarriers() extended with aliasing handling @@void FrameGraph::EmitBarriers(const std::vector<Barrier>& barriers){for (auto& b : barriers){− // (v2: only state transitions)+ if (b.isAliasing)+ {+ // D3D12: D3D12_RESOURCE_BARRIER_TYPE_ALIASING / Vulkan: appropriate synchronization between the old and new occupant.+ }+ else+ {// D3D12: ResourceBarrier(StateBefore→StateAfter) / Vulkan: vkCmdPipelineBarrier(oldLayout→newLayout).+ }}}
Execute() stays unchanged: still pure playback of the compiled plan. All the new logic lives in Compile(), which now runs five stages instead of three: build edges → topo-sort → cull → scan lifetimes → alias → compute barriers. The last three stages add ~170 lines on top of v2.
The aliasing barrier and alignment rules are the same ones covered in Part I. The important implementation point here is that v3 now emits the aliasing barrier when a block changes occupant, and AllocSize() gives the free-list allocator a conservative aligned size to work with. The size model is still intentionally simplified. A production allocator would query the driver for the real requirements, but the aliasing algorithm stays the same.
That wraps v3. Starting from v2’s compile/execute split, we added lifetime analysis, a greedy free-list allocator, and aliasing-aware barrier insertion, the same architecture Frostbite described at GDC 2017, and the same approach UE5 uses today for every transient FRDGTexture created through FRDGBuilder::CreateTexture. The graph now owns the full lifecycle: declare virtual resources → analyze dependencies → pack physical memory → precompute every barrier → execute as pure playback.
Full v3 source#
#pragma once
// Frame Graph MVP v3 -- Lifetimes & Aliasing
// Adds: lifetime analysis, greedy free-list memory aliasing.
// Builds on v2 (dependencies, topo-sort, culling, barriers).
//
// Compile: clang++ -std=c++17 -o example_v3 example_v3.cpp
// or: g++ -std=c++17 -o example_v3 example_v3.cpp
#include <cstdint>
#include <functional>
#include <string>
#include <vector>
// == Resource description (virtual until compile) ==============
enum class Format
{
RGBA8,
RGBA16F,
R8,
D32F
};
struct ResourceDesc
{
uint32_t width = 0;
uint32_t height = 0;
Format format = Format::RGBA8;
};
using ResourceIndex = uint32_t;
using PassIndex = uint32_t;
using BlockIndex = uint32_t; // index into the physical-block free list
struct ResourceHandle
{
ResourceIndex index = UINT32_MAX;
bool IsValid() const { return index != UINT32_MAX; }
};
// == Resource state tracking ===================================
enum class ResourceState
{
Undefined,
ColorAttachment,
DepthAttachment,
ShaderRead,
UnorderedAccess,
Present
};
inline const char* StateName(ResourceState s)
{
switch (s)
{
case ResourceState::Undefined:
return "Undefined";
case ResourceState::ColorAttachment:
return "ColorAttachment";
case ResourceState::DepthAttachment:
return "DepthAttachment";
case ResourceState::ShaderRead:
return "ShaderRead";
case ResourceState::UnorderedAccess:
return "UnorderedAccess";
case ResourceState::Present:
return "Present";
default:
return "?";
}
}
// Precomputed barrier — stored during Compile(), replayed during Execute().
struct Barrier
{
ResourceIndex resourceIndex;
ResourceState oldState;
ResourceState newState;
bool isAliasing = false; // aliasing barrier (block changes occupant)
ResourceIndex aliasBefore = UINT32_MAX; // resource being evicted
};
struct ResourceVersion
{
PassIndex writerPass = UINT32_MAX;
std::vector<PassIndex> readerPasses;
bool HasWriter() const { return writerPass != UINT32_MAX; }
};
struct ResourceEntry
{
ResourceDesc desc;
std::vector<ResourceVersion> versions;
ResourceState currentState = ResourceState::Undefined;
bool imported = false; // imported resources are not owned by the graph
};
// == Physical memory block ======================================
struct PhysicalBlock
{
uint32_t sizeBytes = 0;
PassIndex availAfter = 0; // sorted pass after which this block is free
};
// == Allocation helpers ========================================
// Minimum placement alignment for aliased heap resources.
// Real APIs enforce similar constraints (e.g. 64 KB on most GPUs).
static constexpr uint32_t kPlacementAlignment = 65536; // 64 KB
inline uint32_t AlignUp(uint32_t value, uint32_t alignment)
{
return (value + alignment - 1) & ~(alignment - 1);
}
inline uint32_t BytesPerPixel(Format fmt)
{
switch (fmt)
{
case Format::R8:
return 1;
case Format::RGBA8:
return 4;
case Format::D32F:
return 4;
case Format::RGBA16F:
return 8;
default:
return 4;
}
}
// Compute aligned allocation size for a resource.
// Real drivers add row padding and tiling overhead; we approximate
// with a simple alignment round-up to demonstrate the principle.
inline uint32_t AllocSize(const ResourceDesc& desc)
{
uint32_t raw = desc.width * desc.height * BytesPerPixel(desc.format);
return AlignUp(raw, kPlacementAlignment);
}
// == Lifetime info per resource =================================
struct Lifetime
{
PassIndex firstUse = UINT32_MAX;
PassIndex lastUse = 0;
bool isTransient = true;
};
// == Render pass ===============================================
struct RenderPass
{
std::string name;
std::function<void()> Setup;
std::function<void(/*cmd list*/)> Execute;
std::vector<ResourceHandle> reads;
std::vector<ResourceHandle> writes;
std::vector<ResourceHandle> readWrites; // UAV (explicit)
std::vector<PassIndex> dependsOn;
std::vector<PassIndex> successors;
uint32_t inDegree = 0;
bool alive = false;
};
// == Frame graph (v3: full MVP) ================================
class FrameGraph
{
public:
ResourceHandle CreateResource(const ResourceDesc& desc);
ResourceHandle ImportResource(const ResourceDesc& desc, ResourceState initialState = ResourceState::Undefined);
void Read(PassIndex passIdx, ResourceHandle h);
void Write(PassIndex passIdx, ResourceHandle h);
void ReadWrite(PassIndex passIdx, ResourceHandle h); // UAV access
template <typename SetupFn, typename ExecFn> void AddPass(const std::string& name, SetupFn&& setup, ExecFn&& exec)
{
passes.push_back({name, std::forward<SetupFn>(setup), std::forward<ExecFn>(exec)});
passes.back().Setup();
}
// == compile — builds the execution plan + allocates memory ==
struct CompiledPlan
{
std::vector<PassIndex> sorted;
std::vector<BlockIndex> mapping; // mapping[ResourceIndex] → physical block
std::vector<std::vector<Barrier>> barriers;
};
CompiledPlan Compile();
void Execute(const CompiledPlan& plan);
// convenience: compile + execute in one call
void Execute();
private:
std::vector<RenderPass> passes;
std::vector<ResourceEntry> entries;
void BuildEdges();
std::vector<PassIndex> TopoSort();
void Cull(const std::vector<PassIndex>& sorted);
std::vector<Lifetime> ScanLifetimes(const std::vector<PassIndex>& sorted);
std::vector<BlockIndex> AliasResources(const std::vector<Lifetime>& lifetimes);
ResourceState StateForUsage(PassIndex passIdx, ResourceHandle h, bool isWrite) const;
std::vector<std::vector<Barrier>> ComputeBarriers(const std::vector<PassIndex>& sorted, const std::vector<BlockIndex>& mapping);
void EmitBarriers(const std::vector<Barrier>& barriers);
};
#include "frame_graph_v3.h"
#include <algorithm>
#include <cassert>
#include <cstdio>
#include <numeric>
#include <queue>
#include <unordered_set>
// == FrameGraph implementation =================================
ResourceHandle FrameGraph::CreateResource(const ResourceDesc& desc)
{
entries.push_back({desc, {{}}, ResourceState::Undefined, false});
return {static_cast<ResourceIndex>(entries.size() - 1)};
}
ResourceHandle FrameGraph::ImportResource(const ResourceDesc& desc, ResourceState initialState)
{
entries.push_back({desc, {{}}, initialState, true});
return {static_cast<ResourceIndex>(entries.size() - 1)};
}
void FrameGraph::Read(PassIndex passIdx, ResourceHandle h)
{
auto& ver = entries[h.index].versions.back();
if (ver.HasWriter())
{
passes[passIdx].dependsOn.push_back(ver.writerPass);
}
ver.readerPasses.push_back(passIdx);
passes[passIdx].reads.push_back(h);
}
void FrameGraph::Write(PassIndex passIdx, ResourceHandle h)
{
auto& ver = entries[h.index].versions.back(); // current version (pre-bump)
if (ver.HasWriter())
passes[passIdx].dependsOn.push_back(ver.writerPass); // WAW edge: prev writer must finish
for (PassIndex reader : ver.readerPasses)
passes[passIdx].dependsOn.push_back(reader); // WAR edge: reader must finish first
entries[h.index].versions.push_back({});
entries[h.index].versions.back().writerPass = passIdx;
passes[passIdx].writes.push_back(h);
}
void FrameGraph::ReadWrite(PassIndex passIdx, ResourceHandle h)
{
auto& ver = entries[h.index].versions.back();
if (ver.HasWriter())
{
passes[passIdx].dependsOn.push_back(ver.writerPass); // RAW edge
}
for (PassIndex reader : ver.readerPasses)
passes[passIdx].dependsOn.push_back(reader); // WAR edge
entries[h.index].versions.push_back({});
entries[h.index].versions.back().writerPass = passIdx;
passes[passIdx].reads.push_back(h);
passes[passIdx].writes.push_back(h);
passes[passIdx].readWrites.push_back(h);
}
// == v3: compile -- builds the execution plan + allocates memory ==
FrameGraph::CompiledPlan FrameGraph::Compile()
{
printf("\n[1] Building dependency edges...\n");
BuildEdges();
printf("[2] Topological sort...\n");
auto sorted = TopoSort();
printf("[3] Culling dead passes...\n");
Cull(sorted);
printf("[4] Scanning resource lifetimes...\n");
auto lifetimes = ScanLifetimes(sorted);
printf("[5] Aliasing resources (greedy free-list)...\n");
auto mapping = AliasResources(lifetimes);
printf("[6] Computing barriers...\n");
auto barriers = ComputeBarriers(sorted, mapping);
// The compiled plan is fully determined — execution order, memory
// mapping, and every barrier transition. Execute is pure playback.
return {std::move(sorted), std::move(mapping), std::move(barriers)};
}
// == v3: execute -- runs the compiled plan =====================
void FrameGraph::EmitBarriers(const std::vector<Barrier>& barriers)
{
for (auto& b : barriers)
{
if (b.isAliasing)
{
printf(" aliasing barrier: physical block shared by resource[%u] -> resource[%u]\n", b.aliasBefore, b.resourceIndex);
}
else
{
printf(" barrier: resource[%u] %s -> %s\n", b.resourceIndex, StateName(b.oldState), StateName(b.newState));
}
// e.g. vkCmdPipelineBarrier / ID3D12GraphicsCommandList::ResourceBarrier
}
}
void FrameGraph::Execute(const CompiledPlan& plan)
{
printf("[7] Executing (replaying precomputed barriers):\n");
for (PassIndex orderIdx = 0; orderIdx < plan.sorted.size(); orderIdx++)
{
PassIndex passIdx = plan.sorted[orderIdx];
if (!passes[passIdx].alive)
{
printf(" -- skip: %s (CULLED)\n", passes[passIdx].name.c_str());
continue;
}
EmitBarriers(plan.barriers[orderIdx]);
passes[passIdx].Execute(/* &cmdList */);
}
passes.clear();
entries.clear();
}
// convenience: compile + execute in one call
void FrameGraph::Execute()
{
Execute(Compile());
}
// == Build dependency edges ====================================
void FrameGraph::BuildEdges()
{
for (PassIndex i = 0; i < passes.size(); i++)
{
std::unordered_set<PassIndex> seen;
for (PassIndex dep : passes[i].dependsOn)
{
if (seen.insert(dep).second)
{
passes[dep].successors.push_back(i);
passes[i].inDegree++;
}
}
}
}
// == Kahn's topological sort -- O(V + E) ========================
std::vector<PassIndex> FrameGraph::TopoSort()
{
std::queue<PassIndex> q;
std::vector<uint32_t> inDeg(passes.size());
for (PassIndex i = 0; i < passes.size(); i++)
{
inDeg[i] = passes[i].inDegree;
if (inDeg[i] == 0)
q.push(i);
}
std::vector<PassIndex> order;
while (!q.empty())
{
PassIndex cur = q.front();
q.pop();
order.push_back(cur);
for (PassIndex succ : passes[cur].successors)
{
if (--inDeg[succ] == 0)
q.push(succ);
}
}
assert(order.size() == passes.size() && "Cycle detected!");
printf(" Topological order: ");
for (PassIndex i = 0; i < order.size(); i++)
{
printf("%s%s", passes[order[i]].name.c_str(), i + 1 < order.size() ? " -> " : "\n");
}
return order;
}
// == Cull dead passes ==========================================
void FrameGraph::Cull(const std::vector<PassIndex>& sorted)
{
if (sorted.empty())
return;
passes[sorted.back()].alive = true;
for (int i = static_cast<int>(sorted.size()) - 1; i >= 0; i--)
{
if (!passes[sorted[i]].alive)
continue;
for (PassIndex dep : passes[sorted[i]].dependsOn)
passes[dep].alive = true;
}
printf(" Culling result: ");
for (PassIndex i = 0; i < passes.size(); i++)
{
printf("%s=%s%s", passes[i].name.c_str(), passes[i].alive ? "ALIVE" : "DEAD", i + 1 < passes.size() ? ", " : "\n");
}
}
// == State inference ===========================================
ResourceState FrameGraph::StateForUsage(PassIndex passIdx, ResourceHandle h, bool isWrite) const
{
// If the caller registered this handle via ReadWrite(), it's a UAV.
for (auto& rw : passes[passIdx].readWrites)
if (rw.index == h.index)
return ResourceState::UnorderedAccess;
if (isWrite)
return (entries[h.index].desc.format == Format::D32F) ? ResourceState::DepthAttachment : ResourceState::ColorAttachment;
return ResourceState::ShaderRead;
}
// == Compute barriers ==========================================
std::vector<std::vector<Barrier>> FrameGraph::ComputeBarriers(const std::vector<PassIndex>& sorted, const std::vector<BlockIndex>& mapping)
{
std::vector<std::vector<Barrier>> result(sorted.size());
// blockOwner[block] = which virtual resource currently occupies it.
std::vector<ResourceIndex> blockOwner;
{
BlockIndex maxBlock = 0;
for (auto m : mapping)
if (m != UINT32_MAX)
maxBlock = std::max(maxBlock, m + 1);
blockOwner.assign(maxBlock, UINT32_MAX);
}
for (PassIndex orderIdx = 0; orderIdx < sorted.size(); orderIdx++)
{
PassIndex passIdx = sorted[orderIdx];
if (!passes[passIdx].alive)
continue;
// --- Collect unique handles (ReadWrite puts h in both reads & writes) ---
std::vector<std::pair<ResourceHandle, bool>> unique; // {handle, isWrite}
std::unordered_set<ResourceIndex> seen;
for (auto& h : passes[passIdx].reads)
if (seen.insert(h.index).second)
unique.push_back({h, false});
for (auto& h : passes[passIdx].writes)
{
if (seen.insert(h.index).second)
{
unique.push_back({h, true});
}
else
{
// already in reads — upgrade to write (UAV)
for (auto& [uh, w] : unique)
if (uh.index == h.index)
{
w = true;
break;
}
}
}
// --- Phase 1: aliasing barriers (block changes occupant) ---
for (auto& [h, _] : unique)
{
BlockIndex block = mapping[h.index];
if (block == UINT32_MAX)
continue;
if (blockOwner[block] != UINT32_MAX && blockOwner[block] != h.index)
{
result[orderIdx].push_back({h.index, ResourceState::Undefined, ResourceState::Undefined, true, blockOwner[block]});
}
blockOwner[block] = h.index;
}
// --- Phase 2: state-transition barriers ---
for (auto& [h, isWrite] : unique)
{
ResourceState needed = StateForUsage(passIdx, h, isWrite);
if (entries[h.index].currentState != needed)
{
result[orderIdx].push_back({h.index, entries[h.index].currentState, needed});
entries[h.index].currentState = needed;
}
}
}
uint32_t total = 0;
for (auto& v : result)
total += static_cast<uint32_t>(v.size());
printf(" Barriers computed: %u transition(s) across %u passes\n", total, static_cast<uint32_t>(sorted.size()));
return result;
}
// == Scan lifetimes ============================================
std::vector<Lifetime> FrameGraph::ScanLifetimes(const std::vector<PassIndex>& sorted)
{
std::vector<Lifetime> life(entries.size());
// Imported resources are not transient -- skip them during aliasing.
for (ResourceIndex i = 0; i < entries.size(); i++)
{
if (entries[i].imported)
life[i].isTransient = false;
}
for (PassIndex order = 0; order < sorted.size(); order++)
{
PassIndex passIdx = sorted[order];
if (!passes[passIdx].alive)
continue;
for (auto& h : passes[passIdx].reads)
{
life[h.index].firstUse = std::min(life[h.index].firstUse, order);
life[h.index].lastUse = std::max(life[h.index].lastUse, order);
}
for (auto& h : passes[passIdx].writes)
{
life[h.index].firstUse = std::min(life[h.index].firstUse, order);
life[h.index].lastUse = std::max(life[h.index].lastUse, order);
}
}
printf(" Lifetimes (in sorted pass order):\n");
for (ResourceIndex i = 0; i < life.size(); i++)
{
if (life[i].firstUse == UINT32_MAX)
{
printf(" resource[%u] unused (dead)\n", i);
}
else
{
printf(" resource[%u] alive [pass %u .. pass %u]\n", i, life[i].firstUse, life[i].lastUse);
}
}
return life;
}
// == Greedy free-list aliasing ==================================
std::vector<BlockIndex> FrameGraph::AliasResources(const std::vector<Lifetime>& lifetimes)
{
std::vector<PhysicalBlock> freeList;
std::vector<BlockIndex> mapping(entries.size(), UINT32_MAX);
uint32_t totalWithout = 0;
std::vector<ResourceIndex> indices(entries.size());
std::iota(indices.begin(), indices.end(), 0);
std::sort(
indices.begin(),
indices.end(),
[&](ResourceIndex a, ResourceIndex b)
{
return lifetimes[a].firstUse < lifetimes[b].firstUse;
});
printf(" Aliasing:\n");
for (ResourceIndex resIdx : indices)
{
if (!lifetimes[resIdx].isTransient)
continue;
if (lifetimes[resIdx].firstUse == UINT32_MAX)
continue;
uint32_t needed = AllocSize(entries[resIdx].desc);
totalWithout += needed;
bool reused = false;
for (BlockIndex b = 0; b < freeList.size(); b++)
{
if (freeList[b].availAfter < lifetimes[resIdx].firstUse && freeList[b].sizeBytes >= needed)
{
mapping[resIdx] = b;
freeList[b].availAfter = lifetimes[resIdx].lastUse;
reused = true;
printf(
" resource[%u] -> reuse physical block %u "
"(%.1f MB, lifetime [%u..%u])\n",
resIdx,
b,
needed / (1024.0f * 1024.0f),
lifetimes[resIdx].firstUse,
lifetimes[resIdx].lastUse);
break;
}
}
if (!reused)
{
mapping[resIdx] = static_cast<BlockIndex>(freeList.size());
printf(
" resource[%u] -> NEW physical block %u "
"(%.1f MB, lifetime [%u..%u])\n",
resIdx,
static_cast<BlockIndex>(freeList.size()),
needed / (1024.0f * 1024.0f),
lifetimes[resIdx].firstUse,
lifetimes[resIdx].lastUse);
freeList.push_back({needed, lifetimes[resIdx].lastUse});
}
}
uint32_t totalWith = 0;
for (auto& blk : freeList)
totalWith += blk.sizeBytes;
printf(
" Memory: %u physical blocks for %u virtual resources\n",
static_cast<uint32_t>(freeList.size()),
static_cast<uint32_t>(entries.size()));
printf(" Without aliasing: %.1f MB\n", totalWithout / (1024.0f * 1024.0f));
printf(
" With aliasing: %.1f MB (saved %.1f MB, %.0f%%)\n",
totalWith / (1024.0f * 1024.0f),
(totalWithout - totalWith) / (1024.0f * 1024.0f),
totalWithout > 0 ? 100.0f * (totalWithout - totalWith) / totalWithout : 0.0f);
return mapping;
}
// Frame Graph MVP v3 -- Usage Example
// Compile: clang++ -std=c++17 -o example_v3 example_v3.cpp
#include "frame_graph_v3.h"
#include "frame_graph_v3.cpp" // single-TU build (Godbolt)
#include <cstdio>
int main()
{
printf("=== Frame Graph v3: Lifetimes & Memory Aliasing ===\n");
FrameGraph fg;
// Import the swapchain backbuffer — externally owned.
// The graph tracks barriers but won't alias it.
auto backbuffer = fg.ImportResource({1920, 1080, Format::RGBA8}, ResourceState::Present);
auto depth = fg.CreateResource({1920, 1080, Format::D32F});
auto gbufA = fg.CreateResource({1920, 1080, Format::RGBA8});
auto gbufN = fg.CreateResource({1920, 1080, Format::RGBA8});
auto hdr = fg.CreateResource({1920, 1080, Format::RGBA16F});
auto bloom = fg.CreateResource({960, 540, Format::RGBA16F});
auto ldr = fg.CreateResource({1920, 1080, Format::RGBA8});
auto debug = fg.CreateResource({1920, 1080, Format::RGBA8});
fg.AddPass(
"DepthPrepass",
[&]()
{
fg.Write(0, depth);
},
[&](/*cmd*/)
{
printf(" >> exec: DepthPrepass\n");
});
fg.AddPass(
"GBuffer",
[&]()
{
fg.Read(1, depth);
fg.Write(1, gbufA);
fg.Write(1, gbufN);
},
[&](/*cmd*/)
{
printf(" >> exec: GBuffer\n");
});
fg.AddPass(
"Lighting",
[&]()
{
fg.Read(2, gbufA);
fg.Read(2, gbufN);
fg.Write(2, hdr);
},
[&](/*cmd*/)
{
printf(" >> exec: Lighting\n");
});
// Compute pass — explicit UAV access on hdr.
fg.AddPass(
"SSR",
[&]()
{
fg.ReadWrite(3, hdr);
},
[&](/*cmd*/)
{
printf(" >> exec: SSR (compute, UAV)\n");
});
fg.AddPass(
"Bloom",
[&]()
{
fg.Read(4, hdr);
fg.Write(4, bloom);
},
[&](/*cmd*/)
{
printf(" >> exec: Bloom\n");
});
// Tonemap writes a NEW resource (ldr), so hdr + bloom die here.
// Their physical blocks become available for reuse.
fg.AddPass(
"Tonemap",
[&]()
{
fg.Read(5, hdr);
fg.Read(5, bloom);
fg.Write(5, ldr);
},
[&](/*cmd*/)
{
printf(" >> exec: Tonemap\n");
});
fg.AddPass(
"Present",
[&]()
{
fg.Read(6, ldr);
fg.Write(6, backbuffer);
},
[&](/*cmd*/)
{
printf(" >> exec: Present\n");
});
// Dead pass — nothing reads debug, so the graph will cull it.
fg.AddPass(
"DebugOverlay",
[&]()
{
fg.Write(7, debug);
},
[&](/*cmd*/)
{
printf(" >> exec: DebugOverlay\n");
});
auto plan = fg.Compile(); // topo-sort, cull, alias, compute barriers
fg.Execute(plan); // replay barriers + run
return 0;
}
What the MVP delivers#
Here’s the full per-frame lifecycle, the same three-phase architecture from Part I, now backed by real code:
AddPass runs its setup lambda:• declare reads & writes
• request virtual resources
• version tracking builds edges
• sort: topo order (Kahn's)
• cull: kill dead passes
• scan lifetimes: first/last use
• alias: free-list reuse
• compute barriers: detect state transitions
CompiledPlan: execution order, memory mapping, and every barrier. No GPU work yet.• submit precomputed barriers
• call execute lambda
• resources already aliased & bound
~350 lines of C++17. No external dependencies, no allocator library, no render backend, just the algorithmic core that production engines build on. Every concept from Part I (virtual resources, dependency edges, topological ordering, dead-pass culling, barrier inference, lifetime analysis, and VRAM aliasing) now exists as running, compilable code.
What’s next#
The MVP runs everything on a single queue, issues barriers as immediate full-pipeline stalls, and uses the simplest possible allocation strategy. Production engines push past all three constraints:
- Async compute: the graph already encodes which passes are independent. Mapping those onto a second hardware queue lets the GPU overlap compute work (SSAO, light culling, particle simulation) with graphics work on the same frame, recovering otherwise-idle ALU cycles.
- Split barriers: instead of stalling the full pipeline at the point of use, split barriers separate the “start transitioning” signal from the “must be done” signal. The driver gets a window to schedule the transition in parallel with unrelated work, often eliminating the stall entirely.
Both features plug directly into the CompiledPlan architecture: async compute adds a queue assignment per pass, and split barriers replace single Barrier entries with begin/end pairs. The graph’s structure doesn’t change, only the execution strategy does.
Part III: Beyond MVP walks through both upgrades with code diffs against the v3 base we just built.
