The zip() function appeared in Python 2 (it returned a list back then), and since Python 3, it returns a lazy iterator. It "zips" multiple sequences into tuples element-wise, making the processing of parallel iterable collections convenient and efficient.
Often, there is a need to process multiple lists (or other types of sequences) simultaneously — for example, to iterate over key-value pairs or to process point-coordinate pairs. Manually synchronizing indices is a source of errors and unreadability in the code, especially for collections of different lengths.
The zip() function takes any number of iterable objects and returns an iterator of tuples, each of which contains corresponding elements from each iterable. If the sequences are of different lengths, the result truncates to the shortest one.
names = ['Alice', 'Bob', 'Charlie'] ages = [24, 27, 30] for name, age in zip(names, ages): print(f'{name} is {age} years old')
You can unpack zip using *:
pairs = [(1, 'a'), (2, 'b'), (3, 'c')] nums, chars = zip(*pairs) print(nums) # (1, 2, 3) print(chars) # ('a', 'b', 'c')
What happens if zip() is passed collections of different lengths?
zip() will stop when it reaches the end of the shortest collection — the remaining elements of the longer collections are ignored.
print(list(zip([1,2,3], ['a','b']))) # [(1, 'a'), (2, 'b')]
How to get tuples, padding shorter sequences with a default value?
Standard zip() can't do that, but itertools.zip_longest can provide such behavior:
from itertools import zip_longest for a, b in zip_longest([1,2], ['x','y','z'], fillvalue=None): print(a, b) # 1 x # 2 y # None z
Can the result of zip() be "unpacked" back into the original lists?
Yes, if all original collections are of the same length and the result has not been changed, the * operator allows unpacking zip.
pairs = [(1,2), (3,4)] a, b = zip(*pairs) print(a) # (1, 3) print(b) # (2, 4)
Processing related collections of different lengths without considering zip's peculiarities:
lst1 = [1,2,3,4] lst2 = ['a','b'] for x, y in zip(lst1, lst2): print(x, y) # 1 a # 2 b # (3,4) and 'c', 'd' from lst1 were not processed
Pros:
Cons:
Using zip_longest with fillvalue to ensure no element is lost:
from itertools import zip_longest lst1 = [1,2,3,4] lst2 = ['a','b'] for x, y in zip_longest(lst1, lst2, fillvalue='?'): print(x, y) # 1 a # 2 b # 3 ? # 4 ?
Pros:
Cons: