Answer to the question

Prior to C++20, the Empty Base Optimization (EBO) allowed empty base classes to share memory addresses with derived class data members, effectively consuming zero storage. However, data members were strictly required to possess unique addresses and non-zero sizes, forcing stateless allocators in containers like std::map to either bloat node sizes or rely on fragile private inheritance. The [[no_unique_address]] attribute explicitly permits a non-static data member to occupy zero bytes if its type is empty, thereby allowing composition over inheritance for allocator storage while maintaining optimal memory density in STL containers.

History of the question

The C++98 allocator model predominantly utilized stateless functors, where EBO via inheritance was the standard technique to avoid storage overhead in standard containers. As C++11 introduced scoped allocators and sophisticated allocator propagation traits, the complexity of inheriting from potentially stateful allocators increased, risking undefined behavior or layout inefficiencies when switching between variants. C++20 standardized the [[no_unique_address]] attribute to provide first-class language support for zero-overhead composition, aligning with the Zero-overhead principle without requiring fragile inheritance hierarchies that complicated class interfaces.

The problem

The C++ object model mandates that complete objects and potentially-overlapping subobjects have distinct non-zero sizes and unique addresses, preventing two data members of the same class from sharing memory locations even if their types are empty. For node-based containers such as std::list or std::map, each node typically stores an allocator instance; without optimization, a stateless allocator adds at least one byte (rounded up to alignment), significantly increasing memory consumption for millions of small nodes. Traditional workarounds utilized private inheritance, which complicated class hierarchies and precluded easy replacement of allocators with stateful alternatives without redesigning the template machinery.

The solution

The [[no_unique_address]] attribute signals to the compiler that a data member requires no unique address, allowing it to be placed at the same memory location as another subobject if the member's type is an empty trivially copyable class. This enables container implementers to declare allocators as direct members while ensuring zero storage cost for stateless types, with the compiler automatically adjusting padding and layout. The attribute preserves strict aliasing rules and object lifetime semantics, merely relaxing the address uniqueness constraint specifically for the annotated member.

#include <iostream>
#include <memory>
#include <cstdint>

// Stateless allocator example
template <typename T>
struct EmptyAllocator {
    using value_type = T;
    EmptyAllocator() = default;
    template <typename U> EmptyAllocator(const EmptyAllocator<U>&) {}
    T* allocate(std::size_t n) { return std::allocator<T>().allocate(n); }
    void deallocate(T* p, std::size_t n) { std::allocator<T>().deallocate(p, n); }
    
    // Empty type
    bool operator==(const EmptyAllocator&) const = default;
};

// Node with [[no_unique_address]]
template <typename T, typename Alloc = EmptyAllocator<T>>
struct NodeOptimized {
    [[no_unique_address]] Alloc allocator; // Zero bytes if Alloc is empty
    T value;
    NodeOptimized* next;
    
    explicit NodeOptimized(const T& val) : value(val), next(nullptr) {}
};

// Node without optimization (for comparison)
template <typename T, typename Alloc = EmptyAllocator<T>>
struct NodeNaive {
    Alloc allocator; // Always 1+ bytes
    T value;
    NodeNaive* next;
    
    explicit NodeNaive(const T& val) : value(val), next(nullptr) {}
};

int main() {
    std::cout << "Optimized node size: " << sizeof(NodeOptimized<int>) << " bytes
";
    std::cout << "Naive node size: " << sizeof(NodeNaive<int>) << " bytes
";
    // On typical implementations, Optimized will be 16 bytes (8+4+4 or similar)
    // while Naive will be 24 bytes (1 padded to 8 + 8 + 4 + padding)
    return 0;
}

Situation from life

In a low-latency trading infrastructure project, the team needed to implement a custom intrusive red-black tree for order matching, where each node represented a limit order. The system required pluggable memory strategies: a stack allocator for pooled fixed-size chunks during market hours, and std::allocator for back-testing scenarios.

The initial implementation used private inheritance from the allocator to leverage Empty Base Optimization, assuming the standard allocator would cost zero bytes.

// Initial approach: Inheritance-based EBO
template <typename T, typename Alloc>
class OrderNode : private Alloc { // Awkward: Alloc is a base
    T data;
    OrderNode* left;
    OrderNode* right;
    Color color;
public:
    // Problem: Ambiguity if Alloc has methods named 'left' or 'color'
    // Problem: Cannot easily store Alloc as a member if stateful
};

This approach proved brittle. When the risk management team demanded a stateful auditing allocator that tracked memory usage counters, switching to a member variable caused an immediate 8-byte inflation per node due to alignment, increasing the total memory footprint by 40% and degrading cache performance.

Alternative Solution A: Type-erased storage with std::variant.

The team considered storing either a pointer to the allocator (for stateful) or nothing (for stateless) using std::variant or manual type erasure.

Pros: Unified interface for stateful and stateless allocators without template explosion.

Cons: Indirection overhead for stateful allocators, and the variant itself required at least one byte (plus alignment) for discriminator storage, failing to solve the zero-overhead requirement for the critical path where stateless allocators were predominant.

Alternative Solution B: Template specialization with distinct classes.

They evaluated specializing the entire OrderNode class based on std::is_empty_v<Alloc>, inheriting when empty and composing when stateful.

Pros: Guaranteed zero overhead for the empty case.

Cons: Code duplication between the two specializations, doubled compilation times, and maintenance nightmares when adding new node fields, as changes had to be mirrored in both template branches.

Chosen Solution and Result:

The team migrated to C++20 and applied [[no_unique_address]] to the allocator member.

template <typename T, typename Alloc>
struct OrderNode {
    [[no_unique_address]] Alloc alloc; // Zero cost if empty
    T data;
    OrderNode* left;
    OrderNode* right;
    // ... rest of implementation
};

This design eliminated the need for inheritance while maintaining zero bytes of overhead for the production stack allocator. When the auditing allocator (stateful) was substituted, the member automatically expanded to accommodate its counters without code changes. Benchmarks showed a 15% reduction in cache misses compared to the inheritance-based version due to better compiler optimizations on the flatter class hierarchy, and the codebase became significantly more maintainable.

What candidates often miss

Can two [[no_unique_address]] data members of the same empty type occupy the same memory address?

No, they cannot. While [[no_unique_address]] removes the requirement for a unique address relative to other subobjects, C++ still mandates that distinct complete objects of the same type must have distinct addresses. If two members m1 and m2 of the same empty class type were annotated, the compiler must allocate separate storage (typically 1 byte each, subject to alignment) to ensure &node.m1 != &node.m2. The attribute only permits overlap with members of different types or with base class subobjects.

How does [[no_unique_address]] interact with offsetof and standard-layout types?

The interaction is subtle and potentially dangerous. If a class contains [[no_unique_address]] members, it can still be standard-layout, but invoking offsetof on such a member yields implementation-defined results if the member is empty and overlaps with another subobject. Furthermore, because the standard-layout rules assume non-static data members occupy distinct bytes in declaration order, overlapping an empty member with a subsequent member technically violates the strict ordering assumption some legacy code makes. Developers should avoid pointer arithmetic based on offsetof for [[no_unique_address]] members and instead rely on std::addressof.

Why is [[no_unique_address]] unnecessary for base classes, and what risks does it avoid compared to inheritance?

Base classes inherently qualify for Empty Base Optimization without attributes, as an empty base subobject is permitted to share the address of the first non-static data member of the derived class. [[no_unique_address]] exists specifically to grant this capability to data members, enabling composition. Using data members avoids the name hiding and multiple inheritance ambiguity pitfalls of private inheritance. For example, if a container inherited from an allocator that defined a nested pointer typedef, and the container also defined its own pointer type, unqualified lookup would resolve to the base class member, causing obscure compilation errors. Data members with [[no_unique_address]] eliminate this scope pollution while preserving layout efficiency.

Under what object model constraint does the **C++20** attribute `[[no_unique_address]]` bypass the traditional prohibition against zero-sized data members, thereby optimizing stateless allocator storage in node-based containers?