Background
Python has added the so-called "generator expressions" since version 2.4, complementing list comprehensions. They allow for the creation of a lazy sequence of values, similar to generators, but in a compact and readable form.
Problem
List comprehensions ([x for x in iterable]) create a list, immediately loading all elements into memory. This can be inefficient or even dangerous if the number of elements is very large. Generator functions (using yield) are more flexible but require separate function definitions and more lines of code.
Solution
Generator expressions ((x for x in iterable)) provide a concise syntax for producing lazy sequences (elements are computed on demand rather than being loaded all at once). They look similar to list comprehensions but use parentheses:
# List comprehension loads everything into memory squares_list = [x**2 for x in range(10**6)] # Generator expression: elements come on request, memory is hardly used squares_gen = (x**2 for x in range(10**6)) # Get the first five values from the generator for _ in range(5): print(next(squares_gen))
Key features:
Can you "reiterate" the same generator expression multiple times?
No, after iterating once, the generator is "exhausted". To iterate again, you need to create a new generator or use a list comprehension.
it = (x for x in range(3)) print(list(it)) # [0,1,2] print(list(it)) # [] — no more values can be obtained
Do generators maintain state between uses?
Yes, the generator expression saves the "position" between calls to next() (or during the next iteration), but it cannot be reset to the start unless you create a new object.
Can you use a generator expression multiple times in one line?
No! If you "unpack" the generator in multiple places at once (for instance, to several functions simultaneously, not returning it to a list), some data will be lost — each child usage advances the pointer.
g = (x for x in range(3)) print(sum(g), list(g)) # sum(g) will get everything, list(g) will remain empty
In a project for analyzing large files, they used:
data = (parse_line(line) for line in file) process(list(data)) other_process(list(data))
Pros:
Cons:
They used list comprehension if repeated use of the data was required, or create a generator for single consumption:
# Generator only for single analysis (e.g., to calculate the sum) total = sum(parse_line(line) for line in file)
Pros:
Cons: