Answer to the question.

History of the question

Early coroutine implementations were stackful, allocating megabytes of fixed stack space per context switch, which limited concurrency to thousands of tasks. C++20 introduced stackless coroutines allocating frames on the heap, yet naive recursive composition still risked stack overflow because asymmetric transfer—returning void or bool from await_suspend—forced the resumer to call resume(), building O(N) native call stack frames. Symmetric transfer was standardized to allow coroutine A to directly resume coroutine B, relinquishing A’s stack frame via mandatory tail-call optimization.

The problem

When coroutine A performs co_await on coroutine B, and B awaits C, asymmetric transfer requires each resume() invocation to return to its caller before descending deeper. With recursion depth N (e.g., traversing 50,000+ tree nodes), this exhausts the native stack despite each coroutine frame residing on the heap, causing SIGSEGV or STATUS_STACK_OVERFLOW.

The solution

await_suspend must return std::coroutine_handle<Promise> (or std::coroutine_handle<>). The compiler treats this as a tail-call: it destroys the current activation record and jumps directly to the target handle’s resume point without growing the call stack. This mechanism guarantees constant-stack-depth execution regardless of logical coroutine nesting depth.

struct Task {
    struct promise_type {
        Task get_return_object() { 
            return Task{std::coroutine_handle<promise_type>::from_promise(*this)}; 
        }
        std::suspend_always initial_suspend() { return {}; }
        std::suspend_always final_suspend() noexcept { return {}; }
        void return_void() {}
        void unhandled_exception() {}
    };
    std::coroutine_handle<> h;
};

struct SymmetricAwaiter {
    std::coroutine_handle<> target;
    bool await_ready() const noexcept { return false; }
    
    // Asymmetric (bad): void await_suspend(std::coroutine_handle<>) { target.resume(); }
    
    // Symmetric (good): Tail-call optimization
    std::coroutine_handle<> await_suspend(std::coroutine_handle<>) noexcept {
        return target; 
    }
    void await_resume() noexcept {}
};

Situation from life

Problem description

While developing a high-frequency trading engine, we migrated from callback-based asynchronous I/O to C++20 coroutines for modeling complex derivative pricing trees. During stress testing with portfolios containing deeply nested synthetic options (50,000+ levels), the system crashed with stack overflows despite using heap-allocated coroutine frames. The culprit was the initial implementation of await_suspend returning void, which caused the native stack to grow proportionally with the depth of the pricing model.

Different solutions considered

Solution 1: Increase native stack size via ulimit -s or linker flags.

Pros required zero code changes and provided immediate relief during testing. Cons included wasting gigabytes of virtual memory per thread, failing to address unbounded recursion scenarios, and creating portability nightmares between Linux and Windows where stack allocation mechanisms differ significantly.

Solution 2: Implement a trampoline executor loop that never recurses.

Pros included keeping the coroutine syntax intact while moving stack management to a central event loop. Cons involved significant latency penalties (hundreds of nanoseconds per context switch due to virtual dispatch), increased code complexity in the scheduler, and loss of compiler optimizations for register allocation across suspension points.

Solution 3: Adopt symmetric transfer by returning std::coroutine_handle from await_suspend.

Pros provided zero-overhead abstraction (identical assembly to hand-written state machines), naturally handled unbounded recursion without stack growth, and maintained readable coroutine syntax. Cons required C++20 compiler support (initially limited on some embedded platforms) and complicated debugging because stack traces appear truncated due to tail-call elimination.

Which solution was chosen and why

We selected Solution 3 because the financial models inherently required unbounded recursion depth for theoretical pricing calculations. The microsecond-latency budget could not tolerate the overhead of trampolining, and memory constraints prohibited massive stack pre-allocation. Symmetric transfer provided the only zero-cost solution that was both correct and efficient.

The result

The engine successfully processed portfolios with 100,000+ nesting levels without crashing. Latency benchmarks showed identical performance to hand-optimized C state machines, and memory usage remained flat regardless of recursion depth. The system has run in production for 18 months with zero stack-related crashes.

What candidates often miss

Why does await_suspend returning void differ from returning true in terms of coroutine frame suspension timing, and why does this matter for thread safety?

Many candidates assume void implies immediate suspension and transfer of control. Actually, returning void suspends the current coroutine, but control returns to the caller of resume(), who then decides the next execution step. Returning true also suspends, but critically, void guarantees the coroutine is suspended before await_suspend returns, whereas the precise timing of suspension with bool can vary by implementation. This distinction matters because accessing coroutine locals after await_suspend returns void (e.g., from another thread) is safe only after the suspension point is reached. With symmetric transfer (returning a handle), the stack frame is destroyed immediately upon return, making locals inaccessible instantly—candidates often introduce data races by accessing captured variables after initiating a symmetric transfer.

How does symmetric transfer interact with exception handling when the target coroutine throws, and why does this complicate unhandled_exception in the promise type?

Candidates frequently miss that symmetric transfer bypasses normal stack unwinding through the awaiting coroutine. When coroutine A symmetrically transfers to B, and B throws an exception, the exception propagates to B’s unhandled_exception. However, A’s stack frame has already been replaced via tail-call optimization, meaning A cannot catch exceptions from B using try/catch around the co_await expression. The exception instead propagates to A’s original caller (the resumer), potentially skipping A’s cleanup code unless unhandled_exception in A’s promise manages state exclusively through the heap-allocated frame. Beginners often assume RAII stack guards will fire in A, leading to resource leaks when exceptions occur in symmetric chains.

What is the significance of std::noop_coroutine() in symmetric transfer chains, and why must it be returned rather than a default-constructed handle to indicate completion?

A default-constructed std::coroutine_handle is a null handle that exhibits undefined behavior if resumed. Returning it from await_suspend indicates "resume nothing now," leaving the current coroutine suspended without a successor and potentially hanging the system if the scheduler expects a valid continuation. std::noop_coroutine() returns a special singleton handle that, when resumed, immediately returns to its caller. This is crucial for termination: when a leaf coroutine finishes and wants to return control to its parent without manual resumption, it returns std::noop_coroutine(). This allows the parent’s await_suspend (which symmetrically transferred to the child) to receive a valid "continuation" that simply returns, effectively ending the chain safely. Candidates confuse null handles with noop handles, leading to subtle deadlocks where the coroutine system waits forever on a null resume target.

Concerning the C++20 coroutine promise type, what specific return type from `await_suspend` enables stackless symmetric coroutine transfer?

Answer to the question.

Situation from life

What candidates often miss