JavaProgrammingSenior Java Developer

Under what specific atomicity guarantee does **String.hashCode()** safely elide **volatile** qualifiers on its internal hash cache despite concurrent population by multiple threads?

Pass interviews with Hintsage AI assistant

Answer to the question

History of the question

Prior to the JSR 133 specification (Java 5), the Java Memory Model lacked formal happens-before rules, making benign data races hazardous. String has always been a performance-critical immutable class, heavily utilized in HashMap operations. Early JDK versions introduced lazy hash caching to avoid recomputing the hash for large strings repeatedly. The decision to omit volatile on the hash field was a deliberate optimization predating modern concurrency primitives, relying on the idempotent nature of the computation and specific atomicity guarantees added to the JLS in Java 5.

The problem

When multiple threads invoke hashCode() concurrently on a newly created String, they may all observe the default value of 0 in the hash field. Without synchronization, this creates a data race where several threads could simultaneously compute the hash value and attempt to write it back. The challenge is ensuring that no thread ever observes a partially written (torn) hash value or an inconsistent state, while avoiding the prohibitive cost of memory barriers associated with volatile reads and writes on every hashCode() invocation.

The solution

The solution relies on two fundamental JMM properties. First, the Java Language Specification (§17.7) guarantees that writes to 32-bit primitive values (int) are atomic, preventing word tearing. Second, the String constructor establishes a happens-before relationship through its final value field, ensuring the backing array is fully visible to any thread receiving the reference. Since the hash computation is a pure function of this immutable, safely-published data, the race to populate the cache is benign. If a thread reads a stale 0, it simply recomputes the identical value; if it reads the cached value, it uses it. The atomic write ensures the value is either fully observed or not, never corrupted.

public int hashCode() { int h = hash; // Non-volatile read: may see 0 or cached value if (h == 0 && value.length > 0) { char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; // Atomic write: 32-bit assignment is indivisible } return h; }

Situation from life

We were designing a high-throughput ingestion service processing millions of CSV records per second. Each record generated multiple String keys for a ConcurrentHashMap cache. Profiling revealed that hashCode() calculations consumed 15% of CPU time due to large string keys.

Solution A: volatile hash field. We considered adding volatile to the hash field in a custom String wrapper. The pros included immediate visibility across all cores and strict sequential consistency. However, the cons were severe: JMH benchmarks showed a 400% throughput degradation due to cache coherency traffic and memory fence costs on every map operation.

Solution B: synchronized hashCode(). We tested synchronizing the computation. The pros were simplicity and absolute correctness. The cons were catastrophic contention; under 32 threads, latency spiked from 2 nanoseconds to 800 nanoseconds per operation as threads queued for the monitor.

Solution C: Benign race (current implementation). We retained the unsynchronized idempotent caching. The pros were zero synchronization overhead and perfect scalability with core count. The cons were theoretical: occasional redundant computation if threads raced during first access. We chose Solution C because the cost of recomputing a hash (cache miss) was negligible compared to the cost of cache coherency protocols (volatile) or contention (synchronized).

Result: The system sustained 2.5 million operations per second per core without hashCode() appearing in the top 100 hot methods, validating that the benign data race was the correct architectural trade-off for this immutable data structure.

What candidates often miss

Why doesn't the lack of volatile violate the happens-before relationship between the thread that creates the String and the thread that computes its hash?

The happens-before relationship is actually established by the safe publication of the String object itself, not the hash field. When a String is constructed, its final value field guarantees that the backing array contents are visible to any thread receiving the reference. The hash field is merely a cache; observing its default value of 0 is a valid program state that simply triggers computation. The JMM ensures the immutable value array is consistent, and since the hash is derived purely from this visible data, the computation yields the same result regardless of which thread performs it.

Could this same optimization be applied to a 64-bit long hash value without using volatile?

No. The JMM only guarantees atomicity for 32-bit primitives (int, float) on all architectures. For 64-bit primitives (long, double), the specification allows word tearing on 32-bit JVMs or certain architectures without volatile or synchronization. A thread could theoretically observe the high 32 bits of one computed hash and the low 32 bits of another, resulting in a completely incorrect, non-zero hash value that would corrupt HashMap bucket placement. Therefore, caching 64-bit hashes requires volatile or AtomicLong.

How does this differ from the broken "Double-Checked Locking" idiom for singleton initialization?

The critical distinction lies in safe publication and idempotency. In broken Double-Checked Locking, the issue is observing a non-null reference to an object whose constructor has not completed (reordering of reference assignment vs. constructor execution). In String.hashCode(), the String object itself is already safely published and fully constructed; the hash field is merely a lazily initialized cache of pure data. Seeing 0 (uninitialized) is not a partial construction but a valid initial state. Furthermore, the operation is idempotent—multiple threads writing the same computed value produces the same result as one thread, whereas DCL requires exactly one instance creation.