List Comprehensions vs. Generator Expressions in Python

Syntax

The forms are nearly identical. A list comprehension is bracketed with [] and always returns a list. A generator expression is bracketed with () — or, when used as the sole argument to a function call, the parentheses can be omitted — and returns a generator object.

List comprehensionpython

result = [x ** 2 for x in range(1_000_000) if x % 2 == 0]
# type: list — all 500,000 ints in memory now

Generator expressionpython

result = (x ** 2 for x in range(1_000_000) if x % 2 == 0)
# type: generator — nothing computed yet

Parenthesis elision applies only when the generator is the sole argument:

Argument elisionpython

# OK — only argument; outer call parens serve double duty
total = sum(x ** 2 for x in range(10))

# SyntaxError — second argument; explicit parens required
sorted(x ** 2 for x in range(10), reverse=True)  # ✗
sorted((x ** 2 for x in range(10)), reverse=True)  # ✓

Memory model

This is the central difference. A list comprehension evaluates eagerly: every element is computed and stored in a contiguous array before control returns to the caller. A generator expression is lazy: it stores only the iterator state (the code object, the current position, and local variables), and produces values one at a time when iterated.

Memory footprint comparisonpython

import sys

n = 1_000_000

lc = [x * 2 for x in range(n)]
ge = (x * 2 for x in range(n))

print(sys.getsizeof(lc))  # → 8,448,728  (bytes; scales linearly with n)
print(sys.getsizeof(ge))  # → 208        (constant regardless of n)

Note

sys.getsizeof reports the size of the generator object itself — the 208-byte frame — not the values it will produce. Those values are never all in memory simultaneously.

What CPython stores in a generator frame

When you create a generator expression, CPython compiles it to a code object and constructs a PyFrameObject (or the more compact PyInterpreterFrame in 3.11+). This frame holds:

Inspecting the generator objectpython

import inspect

g = (x ** 2 for x in range(5))

print(g.gi_code.co_name)   # → '<genexpr>'
print(g.gi_frame.f_lasti)   # → -1  (not started yet)

next(g)                       # → 0
print(g.gi_frame.f_lasti)   # → some positive offset; frame is suspended here
print(inspect.getgeneratorstate(g))  # → 'GEN_SUSPENDED'

Evaluation strategy

Iterable binding is eager for the outermost clause

A subtle but critical point: in a generator expression, the outermost for clause's iterable is evaluated immediately when the expression is created, not when iteration begins. Inner iterables are evaluated lazily.

Outermost iterable bound at creation timepython

data = [1, 2, 3]
g = (x * 10 for x in data)  # data bound here
data.append(4)                 # mutating the list

list(g)  # → [10, 20, 30, 40] — mutation is visible

# Compare: iterator protocol, not a snapshot
data2 = [1, 2, 3]
lc   = [x * 10 for x in data2]
data2.append(4)
# lc is already [10, 20, 30] — snapshot taken at construction

Warning

A generator expression holds a reference to the iterable, not a copy. Mutating the source between creation and exhaustion affects what the generator yields. A list comprehension does not have this exposure because evaluation is complete before you can mutate anything.

Single-pass exhaustion

A generator is a stateful iterator. Once exhausted it yields nothing — there is no rewind. A list can be iterated any number of times.

Exhaustion behaviourpython

g = (x ** 2 for x in range(3))
list(g)  # → [0, 1, 4]
list(g)  # → []  — exhausted

lc = [x ** 2 for x in range(3)]
list(lc)  # → [0, 1, 4]
list(lc)  # → [0, 1, 4]  — re-iterable

Performance

The right metric depends on what you're measuring. Construction time, peak memory, and per-element overhead each tell a different story.

When generators win: memory-constrained pipelines

Streaming pipeline — generator avoids materialising intermediatespython

import tracemalloc

tracemalloc.start()

# List approach: two full lists in memory simultaneously
evens    = [x for x in range(10_000_000) if x % 2 == 0]
squares  = [x ** 2 for x in evens]
total_lc = sum(squares)

peak_lc = tracemalloc.get_traced_memory()[1]
tracemalloc.clear_traces()

# Generator approach: no intermediates
total_ge = sum(x ** 2 for x in range(10_000_000) if x % 2 == 0)

peak_ge = tracemalloc.get_traced_memory()[1]
tracemalloc.stop()

print(peak_lc / 1e6)  # ≈ 320 MB
print(peak_ge / 1e6)  # ≈ 0.01 MB

When list comprehensions win: repeated access and indexing

Generator expressions have per-next overhead: the interpreter must resume the frame, execute bytecode, and suspend again. If you iterate a collection multiple times, need random access, or pass the result to code that expects a sequence, a list comprehension is faster overall — paying the cost once up front.

timeit comparison — repeated accesspython

import timeit

setup = "data = range(1000)"

# Build once, access twice
lc_stmt = """
lc = [x**2 for x in data]
s = sum(lc) + sum(lc)
"""

# Build twice (cannot reuse exhausted generator)
ge_stmt = """
s = sum(x**2 for x in data) + sum(x**2 for x in data)
"""

print(timeit.timeit(lc_stmt, setup=setup, number=10_000))  # ≈ 0.45 s
print(timeit.timeit(ge_stmt, setup=setup, number=10_000))  # ≈ 0.80 s

Tip

For single-pass aggregation (sum, max, any, all, join), generators often match or beat list comprehensions on speed while using a fraction of the memory. For anything that requires the collection to exist — sorting, slicing, indexing, len() — you need a list.

Variable scope

Both forms create their own scope since Python 3. The iteration variable does not leak into the enclosing scope.

Scope isolation (Python 3)python

x = "outer"

lc = [x for x in range(3)]
print(x)  # → "outer" — x unchanged

ge = (x for x in range(3))
print(x)  # → "outer" — x unchanged

However, generator expressions close over the enclosing scope at creation time. This can produce surprising results in loops if you're not careful:

Late-binding closure pitfallpython

# Common mistake: generator closes over `i`, not its value
gens = [(x + i for x in range(3)) for i in range(3)]
print([list(g) for g in gens])
# → [[0,1,2], [1,2,3], [2,3,4]]   (OK here; i captured from list comp scope)

# More dangerous with actual deferred evaluation
gens2 = []
for i in range(3):
    gens2.append((x + i for x in range(3)))
print([list(g) for g in gens2])
# → [[2,3,4], [2,3,4], [2,3,4]]  — all see i=2 at exhaustion time

# Fix: capture i at creation time via default arg in inner lambda
gens3 = []
for i in range(3):
    gens3.append((x + j for x in range(3) for j in [i]))  # pin i
print([list(g) for g in gens3])
# → [[0,1,2], [1,2,3], [2,3,4]]

Interaction with builtins and short-circuiting

Several builtins short-circuit. With a generator expression they stop iteration as soon as the answer is known, saving work. A list comprehension gives them no opportunity — evaluation is already complete.

Short-circuit behaviourpython

def expensive(x):
    print(f"computing {x}")
    return x % 2 == 0

# List: computes all 1,000,000 before any() checks the first
result = any([expensive(x) for x in range(1_000_000)])

# Generator: stops after the first True (x=0 here)
result = any(expensive(x) for x in range(1_000_000))
# prints "computing 0" then stops

The same applies to all, next with a default, and itertools.islice. For min, max, and sum, there is no short-circuit — all elements are consumed regardless.

Type compatibility

A generator object implements __iter__ and __next__ but not __len__, __getitem__, or __contains__. This matters when passing results to APIs that check for these:

Operations that require a sequencepython

g = (x for x in range(5))

len(g)          # TypeError: object of type 'generator' has no len()
g[2]            # TypeError: 'generator' object is not subscriptable
3 in g          # Works — but CONSUMES up to the found element
reversed(g)    # TypeError: 'generator' object is not reversible

# If you need these, materialise to a list explicitly:
result = list(x for x in range(5))  # equivalent to list comp

Danger

Using in on a generator performs a linear scan and advances the iterator. After a membership test, elements up to and including the found element are consumed and unrecoverable. Never use in on a generator you intend to iterate afterwards.

Nested comprehensions and generator pipelines

Both forms support multiple for and if clauses. The execution order follows left-to-right, outermost-first — identical to nested for loops.

Nested clause orderpython

# Equivalent to:
# for row in matrix:
#     for val in row:
#         if val > 0: yield val
matrix = [[1, -2, 3], [-4, 5, 6]]
pos = [val for row in matrix for val in row if val > 0]
# → [1, 3, 5, 6]

For multi-stage pipelines, chaining generator expressions avoids all intermediate allocations:

Lazy pipeline — no intermediate listspython

import csv

def process_log(path):
    with open(path) as f:
        rows    = csv.DictReader(f)           # iterator
        errors  = (r for r in rows
                    if r["level"] == "ERROR")   # filter
        msgs    = (r["message"].strip()
                   for r in errors)          # transform
        unique  = (m for m in msgs
                    if m not in seen and not seen.add(m))  # dedup

        seen = set()
        return list(unique)  # materialise only at the end

Decision reference

Property	List comprehension `[ ]`	Generator expression `( )`
Return type	`list`	`generator`
Evaluation	Eager — all elements computed immediately	Lazy — computed on demand, one at a time
Memory	O(n) — holds all elements	O(1) — holds iterator state only
Re-iterable	✓ Yes	✗ No — exhausted after one pass
Indexing / slicing	✓ Yes	✗ No
`len()`	✓ Yes	✗ No
Short-circuit with `any/all`	✗ No — fully evaluated before call	✓ Yes
Outermost iterable bound	At construction (eager)	At construction (eager) — inner are lazy
Parenthesis elision	N/A	Only when sole function argument
Best for	Multiple passes, indexing, sorting, known small data	Large data, pipelines, single-pass aggregation
Object size (n=1 000 000)	~8.4 MB	~208 bytes

Choosing between them

Use a list comprehension when: the result must be indexed, sliced, or reversed; the collection will be iterated more than once; you need len(); or the dataset is small enough that materialization cost is negligible and clarity benefits from having a concrete list.

Use a generator expression when: you're doing a single-pass aggregation (sum, max, any, all); you're constructing a lazy pipeline and want to avoid intermediate lists; the source data is large or unbounded; or you want short-circuit evaluation with any/all.

When in doubt and the result is consumed immediately by a single builtin, prefer the generator expression. If the result needs to be passed around, inspected, or consumed more than once, materialise it to a list.