ProgrammingBackend Developer

Explain the work and features of the built-in bytes type in Python. How and where is it used, how does it differ from str, and what nuances are important when processing binary data?

Pass interviews with Hintsage AI assistant

Answer.

Background

With the advent of Python 3, the bytes type became the main one for storing and processing binary data, separating from strings (str). In Python 2, strings (str) could contain both text and bytes, which led to frequent errors when processing data in different encodings.

Problem

In everyday programming, we often encounter tasks of transferring and storing data outside the context of textual information — for example, working with files, network requests, and communication protocols. This requires an explicit, convenient, and safe type that clearly distinguishes a sequence of bytes from string data.

Solution

The bytes type in Python stores an immutable sequence of bytes (integers from 0 to 255) and can be created from a byte literal (with a prefix b) or through explicit type conversion. For safe and predictable interaction between strings (str) and bytes (bytes), the .encode() and .decode() methods are used. When working with files, networks, and various binary protocols, bytes are the primary choice.

Example code:

# Creating a bytes object b = b'hello' # Via literal b2 = bytes([104, 101, 108, 108, 111]) # From a list of integers # Conversion str <=> bytes text = 'текст' bin_text = text.encode('utf-8') # str -> bytes back = bin_text.decode('utf-8') # bytes -> str # Example with a file with open('file.bin', 'rb') as f: data = f.read() # data: bytes

Key features:

  • bytes is an immutable container for a sequence of bytes.
  • Differs from str: str stores (Unicode) text, bytes — binary data.
  • All transformation operations require explicit specification of encoding.

Tricky questions.

Can bytes and str be concatenated into one variable?

No, concatenation with + or f-strings does not work: if you try to perform b'abc' + 'def', a TypeError will occur. Types need to be explicitly converted.

What is the difference between bytes and bytearray?

bytes is an immutable type, meaning its content cannot be changed after creation. bytearray is a mutable variant, supporting methods for changing bytes in place.

b = bytes([1, 2, 3]) # immutable ba = bytearray([1, 2, 3]) # mutable ba[0] = 99 # OK b[0] = 99 # TypeError

How can you find out how many bytes a string will take during conversion via encode()?

The number of bytes depends on the encoding. For example, for 'abc' in utf-8 it is 3 bytes, and 'Привет' — 12. Only after calling encode() can you find the exact size via len():

s = 'Привет' # 6 letters b = s.encode('utf-8') # 12 bytes print(len(b)) # 12

Common mistakes and anti-patterns

  • Confusing bytes and str, passing a string where bytes are expected (e.g., HTTP requests, binary files), or vice versa.
  • Forgetting to explicitly decode bytes when writing to a text file or outputting.
  • Comparing bytes and str directly — will always be False.

Real-life example

Negative case

A developer reads a file in 'rb' (binary) mode and tries to process it directly as a string:

with open('file.txt', 'rb') as f: for line in f: print(line.strip()) # line: bytes

Pros:

  • May work for ASCII documents.

Cons:

  • For Unicode files, processing is impossible without decoding through decode().
  • Errors may occur when trying to concatenate with str.

Positive case

A developer processes byte streams through decode(), introducing encoding control:

with open('file.txt', 'rb') as f: for line in f: print(line.decode('utf-8').strip())

Pros:

  • The code works for any text files in the correct encoding.
  • Predictable behavior when processing and outputting.

Cons:

  • Additional responsibility for explicitly choosing the encoding.
  • Extra error handling during incorrect decoding.