ProgrammingGo Developer

What features of string handling in Go should be known to avoid unexpected errors while programming?

Pass interviews with Hintsage AI assistant

Answer.

Background: In Go, strings (string) are a fundamental and frequently used type, actively used for data exchange, logging, parsing, etc. The difference with Go is that strings are immutable, consisting of an immutable sequence of bytes, and can contain data in UTF-8.

Problem: Many confuse working with strings and byte slices ([]byte), make errors when modifying strings, when slicing by runes, and when trying to work with multibyte characters (e.g., Cyrillic, emoji).

Solution:

Strings are immutable; you cannot change their elements directly — attempting to modify s[0] is invalid. Strings are encoded in UTF-8, meaning one character (rune) can be wider than one byte. Working with []byte is cheaper but requires manual control. Converting between string <-> []byte always creates a copy.

Example code:

s := "hello" fmt.Println(len(s)) // 12 bytes (Cyrillic: 2 bytes each) fmt.Println(len([]rune(s))) // 6 runes, so many letters fmt.Println(string([]byte{228, 189, 160, 229, 165, 189})) // Chinese characters

Key features:

  • Strings are immutable, you cannot change them by index.
  • Strings are encoded in UTF-8, not always 1 character = 1 byte.
  • Converting between string <-> []byte creates a copy.

Tricky Questions.

1. Can you change an individual character of a string through an index (e.g., s[1] = 'a')?

Answer: No. Strings are immutable, and the compiler will raise an error. You need to create a new slice []rune or []byte, modify it, then convert it back to a string.

2. Why does len(str) not always match the number of characters in the string?

Answer: len(str) is the byte count, not the rune (character) count. For Cyrillic or emoji, a long string can yield an unexpectedly intuitive value. To count characters, use []rune:

s := "world 😀" fmt.Println(len(s)) // 7 fmt.Println(len([]rune(s))) // 5

3. Is a string passed by reference or by value to a function?

Answer: By value, but internally it contains a pointer to memory and a length. After passing, both variables "point" to the same text; a copy is not created automatically. The actual memory copy occurs when converting to []byte or from []byte to string.

Common Mistakes and Anti-Patterns

  • Attempting to modify a string directly via indexing.
  • Comparing string length using len — errors with Unicode.
  • Using []byte for serialization but forgetting about encoding.
  • Forgetting that string and []byte are different objects in memory.

Real-Life Example

Negative Case

A developer has a string with Russian characters, takes the first 4 bytes, and expects to get the first letter, but only half a character comes out — "broken" characters.

Pros: It was done quickly and shortly. Cons: Incorrect handling of Unicode data, "broken" strings, panic when trying to parse such strings elsewhere.

Positive Case

Strings are converted to []rune for character manipulation, after necessary actions the string is reassembled through string(). Working with []byte is done only for low-level serialization, taking encoding into account.

Pros: Correct Unicode handling, reliability of functions. Cons: Slightly slower, requires more memory, but safe for any strings.