Internal · SpaceMusic Engineering

Scene Loading and Smooth Transitions

Why loading a new scene shouldn't feel like a hard cut — and what it would take to make every parameter type fade gracefully from one state to the next.

Why this is unusual

A SpaceMusic performance lives or dies on its transitions. A single scene can be technically impressive in isolation, but the moment between scenes — when one configuration gives way to another — is where the work becomes art. Today, loading a scene is a hard cut. We think it can become a crossfade.

Most software treats configuration loading as administrative. You stop, you load, you resume. The values jump; nobody minds because nobody is watching live. SpaceMusic is the opposite case: the audience is watching, the engine is running, and every parameter swap is visible. That makes scene loading a real-time concern, and real-time concerns want real-time treatment.

The recent file-management work (plan 033) is the foundation that makes this conversation possible. Now that file paths live in the channel model alongside every other parameter, a scene is no longer a hybrid of "model state", "files on disk", and "patch wiring". It is a single serialized snapshot of the project. With that piece in place, the question becomes architectural: how do we move from one snapshot to another continuously, rather than instantly?

How the rest of the world handles this

Three patterns in the wider software world. Each represents a different trade between simplicity and continuity.

Hard cut. Load the new preset; every value jumps. Fast, simple, indistinguishable from a glitch when the audience is watching. Most synth presets, most game engines' scene loads, most config-driven tools
Per-parameter automation. Every value moves along an automation lane between its current and target. Smooth for continuous parameters; can't help with discrete ones. Modern DAW automation, motion-graphics keyframing, web animation libraries
A / B with crossfade. Two complete runtime copies; an operator-controlled coefficient determines the blend. Solves the discrete-parameter problem by running both states in parallel and blending the output, not the state. DJ decks, theatrical lighting consoles (cue crossfade), video switchers

SpaceMusic does pattern 1 today. The argument of this document is that we should move toward pattern 3, with the easy parts of pattern 2 folded in where they suffice.

Where we are today

A scene is a serialized ProjectModel. Loading is a single channel write: Project.Object = model. From that one assignment, every sub-channel — every slider value, every plugin selection, every render mode, every file path — rebinds in the same frame. The 3D pipeline, the audio engine, the Pro UI, and the patch all observe the new state simultaneously.

Figure 1 · The status-quo load path — one frame, every channel

The write is fast — the model is small, the channel hub is fast — but speed isn't the issue. The issue is that there is no middle ground. There is no moment of transition; there is only a before and an after. Everything that was the old scene is gone in the next frame, and everything that is the new scene is present. From the audience's point of view, this is a glitch.

This isn't a bug. It's the consequence of treating scene state as a single value to be set. The channel architecture does exactly what it's designed to do: it propagates the write. The question is whether scene state should be a single value at all.

The interpolation spectrum

Before we propose an architecture, we need to look at what's actually changing during a scene load. Not every parameter is the same. Some are trivial to interpolate. Some are tractable with effort. Some can't be interpolated at all, and need a fundamentally different approach.

Figure 2 · The interpolation spectrum — three tiers of parameter change

Tier 1 — direct lerp. Anything that's a ParamFloat can move continuously between two values. The math is trivial: value = lerp(A, B, t) where t walks from 0 to 1 over the transition duration. This covers the majority of exposed surface area in SpaceMusic — sliders, gains, positions, opacities, color channels. Most of what an operator touches during a performance lives here.

Tier 2 — output blend. Plugin runtimes already produce UiTexture outputs that are separate from their parameter state (the PluginNDRuntimeModel classes the codegen emits). That separation is the key: even if a plugin's internal state can't smoothly transition — because, say, the shader logic itself fundamentally changed — its texture output can still be blended pixel-by-pixel against the texture from another instance. The producer "jumps" in state; the composited output is continuous.

Tier 3 — no native interpolation. A render mode is a different shader. A plugin selection is a different runtime. A file path points to different audio bytes or different geometry. These are categorical — there is no in-between. The naive options are to delay them, gate them, or hide the jump under cover of something else (a fade, a beat hit, a black frame).

The accent in the diagram is on Tier 3 deliberately. Tier 1 is essentially free — it's just lerping floats. Tier 2 is a known technique — render two pipelines, blend their outputs. Tier 3 is where the architecture has to bend. That is the part of the problem the rest of this document is about.

The A / B model

The pattern that solves Tier 3 is the one DJs and lighting consoles have used for decades: keep two complete copies of the state running in parallel, and let an operator-controlled coefficient decide how much of each one reaches the output.

For SpaceMusic, that means a second slot in the runtime — not in the model, not in the patch, but in the live state that drives rendering. Scene A occupies one slot; Scene B occupies the other. A crossfade coefficient x, somewhere between 0 and 1, decides what the audience sees and hears. The operator can load any scene into the inactive slot at leisure, then crossfade across whenever the moment is right.

Figure 3 · The A/B model — two runtimes, one crossfade coefficient

The crossfader is the focal element, and it is doing different work for different parameter tiers:

Tier 1 (floats) — the active channel value is computed as lerp(A.value, B.value, x). There is still only one live channel; both runtimes read from it. The model snapshots A and B are the source values, the live channel is the blend.
Tier 2 (textures) — Runtime A and Runtime B each produce their own output texture. The crossfader blends them in a shader, controlled by x. This is where most of the visible work happens during a transition.
Tier 3 (categorical) — Runtime A is configured by A's categorical values; Runtime B is configured by B's. The swap from one to the other is masked by the texture blend above. By the time x reaches 1, Runtime A is producing zero visible contribution, so its categorical state can be reset without anyone noticing.

"Two of everything" is the cost. A second runtime means a second copy of every plugin, a second render pass, a second audio mixer. We can be smart about which subsystems actually double: a plugin that's identical between A and B doesn't need two runtimes — it needs one runtime that reads a single lerped channel. The doubling is only required where the categorical state differs.

What about the harder cases

The architecture above handles most of what scenes throw at it. A few cases need additional care.

Render-mode swaps that aren't texture-blendable. Some render modes don't produce a flat 2D texture that can be alpha-blended — they're a different way of drawing the 3D scene entirely. In those cases the crossfader can still operate, but in the geometric domain: scale Scene A's contribution down (toward zero size, or into the distance) while Scene B grows in. The blend is in screen space rather than pixel space, but it's still a crossfade.

File-backed assets. Loading a new audio file or a new 3D model takes wall-clock time. If we ask Runtime B to load kick_heavy.wav the moment the crossfade starts, the crossfade will stall waiting for I/O. The fix is the same one a DJ uses: pre-cue. Loading into the B slot starts as soon as the operator picks the next scene, not when the crossfade begins. By the time the operator pushes the fader, the B-side assets are already resident.

Parameters that aren't sampled per frame. A few channels update on demand rather than continuously — values that come from external MIDI, OSC, or scheduled events. The crossfade duration needs to cover the longest expected update latency for those, otherwise the blend will visibly stutter when one of them finally arrives. The mitigation is to over-allocate transition time, not to over-engineer the channel.

Audio specifically. Audio crossfades are the easiest case in one sense (every DAW does them) and the hardest in another (phase artifacts, perceived loudness changes, abrupt cuts on bar boundaries). The good news is that the channel architecture treats audio gain the same as any other ParamFloat, so the basic mechanism — lerp the gain on the outgoing source, lerp it up on the incoming source — is just Tier 1. The taste decisions (equal-power vs. linear curves, beat-quantization, etc.) are operator-facing concerns that can sit on top.

Why this matters

"A scene change is the moment the audience either stays in the work or remembers they're in a room with a computer."

Three reasons this matters now, in the order an outsider would care about them:

1. The performance case demands it. SpaceMusic exists to make audiovisual performances that hold an audience. Every hard cut between scenes is a small breach of that contract — a moment where the seams of the software show. Other performance tools that don't have this problem (lighting consoles, DJ software, video switchers) all converged on the A/B crossfade pattern decades ago. We should learn from that convergence rather than rediscover it.

2. The headless installation case benefits too. The ABB-style automated installation runs scene changes on a schedule. With a hard cut, those transitions land like edits in a video; with a crossfade, they land like breaths. Same architecture, no additional operator workload — the scheduler just animates x instead of triggering loads.

3. The roadmap implies the architecture. Remote control, multi-operator editing, web UIs — all of these eventually involve someone preparing the next scene while someone else watches the current one. That is, definitionally, an A/B model. Building the slot architecture now means the remote and multi-user features sit naturally on top later. Building hard-cut loading now means redoing the architecture later, with worse leverage.

The plan 033 work made scene state model-clean. The work proposed here makes scene transitions architecturally first-class. Together they take scene loading from "a feature that mostly works" to "a property of the engine that the rest of the product can rely on."

Settled

Scene state is model-clean

Plan 033 moved file paths into channels. A scene is now a single serializable ProjectModel snapshot. No separate "files" registry, no patch-side bookkeeping. The foundation for two-snapshot loading is already in place.

Next step

Per-parameter blend audit

Walk the channel hierarchy once, tag every leaf as Tier 1, 2, or 3. The result is a finite, reviewable list of categorical parameters that need A/B doubling — and confirms the long tail is in Tier 1 where lerp is free.

Future

Dual-slot runtimes

A second runtime instance, a crossfade coefficient channel, a texture-blend primitive. Once those land, scheduled transitions, remote pre-cueing, and multi-operator workflows become applications of the same mechanism rather than new architectures.