GoProgrammingSenior Go Developer

Evaluate the mechanism by which Go's runtime reclaims excess goroutine stack memory, specifying the utilization threshold that triggers deallocation and the ultimate fate of released regions?

Pass interviews with Hintsage AI assistant

Answer to the question.

History of the question

Before Go 1.3, the runtime employed segmented stacks that split into linked chunks at function call boundaries. This design caused severe "hot split" performance cliffs when the stack boundary was crossed frequently during tight loops. Go 1.3 replaced this with contiguous stacks that are copied to larger, single contiguous regions during growth. However, early implementations of contiguous stacks never released memory back to the heap, causing permanent RSS growth for goroutines that transiently required deep call stacks during initialization or batch processing. Go 1.5 introduced automatic stack shrinking to reclaim unused stack memory during garbage collection cycles, completing the memory management lifecycle for goroutine stacks.

The problem

Without a shrinking mechanism, a goroutine that temporarily enters deep recursion (e.g., processing a deeply nested JSON document or walking a complex dependency tree) would retain its peak stack allocation indefinitely even after returning to an idle event loop. This leads to memory bloat in long-running applications, particularly those using worker pools where goroutines alternate between high-stack tasks and idle states. The challenge lies in safely identifying when a stack is truly underutilized and relocating active frames to a smaller memory region without corrupting in-progress computations, stack-allocated pointers, or violating the ABI requirements for calling conventions.

The solution

The Go runtime shrinks stacks during the GC mark phase when scanning root sets. It examines each goroutine's stack usage; if the high-water mark of utilized portion falls below one-quarter (25%) of the currently allocated stack size, the runtime allocates a new stack half the size of the current one (but never smaller than the minimum 2KB). The runtime then asynchronously stops the target goroutine at a safe point, copies the live stack frames to the new smaller region, uses compiler-generated pointer maps to update all interior pointers referencing stack addresses, and frees the old stack memory back to the runtime's mheap allocator.

Situation from life

We operated a high-throughput log processing service where each goroutine handled parsing of potentially deeply nested JSON payloads (up to 10,000 levels deep during malformed input attacks). After processing, these goroutines returned to a sync.Pool to await new connections. We observed that the service's RSS memory grew linearly with the number of pooled goroutines, never releasing memory even during idle periods, eventually triggering OOM kills on containers with 4GB limits despite the actual working set being only 200MB.

We considered forcibly killing pooled goroutines after a set number of processed requests and spawning fresh replacements. This would guarantee stack memory release since new goroutines start with minimal 2KB stacks. However, this approach introduced significant CPU overhead from constant goroutine creation and destruction, disrupted TCP connection pooling optimizations, and caused higher latency tail latencies due to cache cold starts.

Implementing a hard limit on stack growth via debug.SetMaxStack would prevent excessive allocation during the deep recursion events. While this protected against OOM, it caused legitimate but deep parsing tasks to panic with runtime: goroutine stack exceeds 1000000000-byte limit. This resulted in dropped customer data and service errors that violated our reliability SLAs, making it unacceptable for production.

We evaluated periodically calling runtime.GC() followed by debug.FreeOSMemory() every 30 seconds to force stack scanning and shrinkage. This successfully reduced RSS but introduced stop-the-world pauses of 5-10ms every invocation, which violated our p99 latency requirements of <2ms for the API tier and increased CPU utilization by 15% due to forced full collections.

We ultimately relied on Go's native stack shrinking mechanism by ensuring we ran Go 1.20+ and tuned GOGC to trigger more frequent garbage collections (setting it to 50 instead of 100). This increased the frequency of stack shrink opportunities without manual intervention. We also restructured the parser to use an iterative approach with an explicit heap-allocated stack for path tracking, reducing maximum recursion depth from 10,000 to 100. The combination allowed natural shrinking to occur frequently enough to keep memory bounded.

The service RSS stabilized at approximately 800MB under load, down from the previous 3.8GB ceiling. Goroutine stack profiles showed 95% of pooled workers maintained the minimum 2KB stack size between requests, with spikes only occurring during active parsing. The OOM kills ceased entirely, and p99 latency remained under 1.5ms since we avoided manual GC pauses and goroutine churn.

What candidates often miss

Does stack shrinking occur immediately when a function returns and the stack pointer decreases?

No, the runtime does not monitor stack pointer decrements in real-time to trigger immediate deallocation. Shrinking is exclusively performed during the garbage collection mark phase when the scheduler scans all goroutine stacks. The runtime checks the high-water mark of stack usage since the last GC. If this high-water mark is below 25% of the current physical allocation, only then does the shrinking logic execute. This lazy evaluation amortizes the cost of copying stacks across all goroutines during a period when the world is already paused for marking, though the actual copy requires stopping the individual goroutine.

What is the exact shrinking ratio and minimum size, and does the runtime ever release memory back to the OS?

When a stack qualifies for shrinking, the runtime allocates a new stack with half the size of the current one. This geometric reduction prevents thrashing where a goroutine oscillating slightly above and below a threshold would constantly grow and shrink. The new size is bounded below by the platform's minimum stack size, typically 2KB on 64-bit systems. The memory from the old stack is returned to the runtime's mheap, not directly to the operating system. The operating system only reclaims this physical memory if the scavenger determines the heap has been idle and exceeds the goal, or if debug.FreeOSMemory() is invoked.

Is the goroutine stopped during stack shrinking, and how are pointers updated?

Yes, shrinking requires stopping the target goroutine at a safe point, similar to stack growth. The runtime must copy live frames to a new memory location and update all pointers that reference stack-allocated variables. The compiler generates pointer maps that identify which words in each frame are pointers. During the shrink, the runtime uses these maps to find and adjust interior pointers to point to the new stack addresses. This operation is not concurrent; the goroutine cannot execute during the copy, but other goroutines continue running.