The history of this question traces back to the stabilization of std::task::Waker in Rust 1.36, which introduced a standardized mechanism for executors to notify futures of readiness. Prior to this, async frameworks relied on boxed closures or custom notification traits that imposed allocation overhead and prevented seamless integration with C libraries. The RawWaker API was designed to support zero-cost abstractions by allowing developers to construct Waker instances from raw pointers and function pointer tables (RawWakerVTable), mirroring C++'s virtual tables but with Rust's safety requirements.
The problem arises because RawWaker construction bypasses Rust's ownership and borrowing system entirely. The programmer must manually ensure four critical invariants: the data pointer must remain valid for the lifetime of all Waker clones (not just the original), the four vtable functions (clone, wake, wake_by_ref, drop) must be thread-safe (Send and Sync) even if the executor is single-threaded, and the clone function must return a new RawWaker referencing the same underlying task state. Additionally, the vtable must use the extern "C" ABI to ensure FFI compatibility and stable calling conventions across Rust versions.
The solution requires strict adherence to unsafe invariants. The data pointer should typically reference 'static data or be wrapped in Arc to manage shared ownership across clones. The vtable functions must correctly implement reference counting semantics: clone should increment the count, drop should decrement it, and wake should decrement after notification (consuming the Waker). Violating the ABI contract—such as using Rust calling conventions instead of extern "C"—results in undefined behavior when the executor invokes these pointers, including stack corruption, argument misalignment, or jumping to invalid memory addresses.
use std::sync::Arc; use std::task::{RawWaker, RawWakerVTable, Waker}; struct TaskState { id: u64, } unsafe fn clone_waker(data: *const ()) -> RawWaker { let arc = Arc::from_raw(data as *const TaskState); let _ = Arc::clone(&arc); let _ = Arc::into_raw(arc); // Leak back to avoid drop RawWaker::new(data, &VTABLE) } unsafe fn wake_waker(data: *const ()) { let arc = Arc::from_raw(data as *const TaskState); drop(arc); // Drop the Arc, releasing the reference } unsafe fn wake_by_ref(data: *const ()) { let arc = Arc::from_raw(data as *const TaskState); // Wake logic here, then leak back let _ = Arc::into_raw(arc); } unsafe fn drop_waker(data: *const ()) { let _ = Arc::from_raw(data as *const TaskState); // Implicit drop releases memory } static VTABLE: RawWakerVTable = RawWakerVTable::new( clone_waker, wake_waker, wake_by_ref, drop_waker, ); fn create_waker(state: Arc<TaskState>) -> Waker { let ptr = Arc::into_raw(state) as *const (); unsafe { Waker::from_raw(RawWaker::new(ptr, &VTABLE)) } }
Consider developing a high-frequency trading system where a Rust async runtime must interface with a legacy C++ market data feed library. The C++ library provides a registration function accepting a void* context and a function pointer, invoking the callback when price updates arrive. The engineering challenge requires creating a Waker that bridges Rust futures with this C++ callback mechanism without introducing per-message allocation overhead, as latency requirements demand sub-microsecond wake times.
One solution involved storing a Box<dyn Fn() + Send> closure as the Waker's data pointer. This approach offered memory safety through Rust's ownership system and straightforward integration. However, it introduced unacceptable heap allocation latency for every market data subscription and virtual dispatch overhead that violated the system's zero-copy architecture. Furthermore, managing the boxed closure's lifetime across the FFI boundary proved hazardous, as the C++ library's asynchronous cleanup could leave dangling pointers if the Rust side dropped the Waker before the C++ library stopped invoking the callback.
An alternative approach utilized a global static hash map mapping integer IDs to task handles, passing the ID as the void* context. This eliminated allocations and provided O(1) lookup during wake operations. Yet, this created a memory leak hazard if tasks completed without unregistering from the feed, and the static map required Mutex synchronization that became a contention bottleneck under high market data throughput, effectively serializing wake notifications across all CPU cores.
The chosen solution implemented a custom RawWaker where the data pointer held an Arc<TaskState> containing the C++ callback context and a completion flag. The RawWakerVTable functions were implemented as unsafe extern "C" thunks that safely transmuted the void* back to Arc pointers, ensuring proper reference counting across the FFI boundary. This design eliminated per-message allocations by reusing the Arc structure, maintained thread-safety through Arc's atomic operations, and ensured memory safety by decrementing the reference count only when the last Waker clone was dropped. The result achieved sub-microsecond wake latencies while maintaining memory safety guarantees across the Rust/C++ boundary, successfully passing Miri's undefined behavior detection and stress tests involving millions of concurrent price updates.
Why must the RawWakerVTable functions be thread-safe (Send + Sync) even if the executor is single-threaded?
The Waker type implements Clone, Send, and Sync, allowing it to migrate across thread boundaries regardless of the executor's threading model. When a future holds a Waker and passes it to a spawn_blocking task or a std::sync::mpsc channel, the Waker may be invoked from a different thread than the one that created it. If the vtable functions assume single-threaded access—for instance, by using Rc or unsynchronized static mut—they create data races when wake() is called concurrently. Furthermore, async runtimes like Tokio or async-std may migrate tasks between worker threads for load balancing, meaning the Waker could be cloned and dropped on threads different from its creation site. The thread-safety requirement ensures that the notification mechanism remains valid regardless of how the Waker is shared throughout the program.
What catastrophic failure occurs if the clone function returns a RawWaker with a different vtable than the original?
The Waker contract requires that all clones of a Waker represent the same underlying task and behave identically when invoked. If clone returns a RawWaker pointing to a different vtable—perhaps one associated with a different task or containing null function pointers—the executor may invoke the wrong wake logic when notifying the task. This results in either waking an unrelated task (logical corruption) or jumping to invalid memory (segmentation fault). Specifically, the executor typically stores Waker clones in internal queues; when an event occurs, it calls wake() on these stored handles. A mismatched vtable means the data pointer (task context) is interpreted through the wrong function signatures, leading to immediate undefined behavior when the vtable functions cast the pointer to an incorrect type or access fields at wrong offsets.
Why is extern "C" ABI mandatory for the vtable functions rather than the default Rust ABI?
The RawWakerVTable specifies extern "C" function pointers to guarantee FFI compatibility and ABI stability. The Rust ABI is not stable across compiler versions or optimization levels; function signatures might change based on compiler internals, inlining decisions, or target architectures. Using extern "C" ensures that the calling convention follows the platform's C standard, making the vtable compatible with C code and preventing undefined behavior when the compiler generates code for the function pointers. Additionally, the extern "C" ABI mandates specific register usage and stack cleanup rules that allow the Waker to be passed across language boundaries safely. Without this constraint, linking against dynamic libraries or upgrading the Rust compiler could change the function calling convention, causing stack corruption or argument misalignment when the executor invokes wake() or clone().