The forms are nearly identical. A list comprehension is bracketed with [] and always returns a list. A generator expression is bracketed with () — or, when used as the sole argument to a function call, the parentheses can be omitted — and returns a generator object.
result = [x ** 2 for x in range(1_000_000) if x % 2 == 0] # type: list — all 500,000 ints in memory now
result = (x ** 2 for x in range(1_000_000) if x % 2 == 0) # type: generator — nothing computed yet
Parenthesis elision applies only when the generator is the sole argument:
# OK — only argument; outer call parens serve double duty total = sum(x ** 2 for x in range(10)) # SyntaxError — second argument; explicit parens required sorted(x ** 2 for x in range(10), reverse=True) # ✗ sorted((x ** 2 for x in range(10)), reverse=True) # ✓
This is the central difference. A list comprehension evaluates eagerly: every element is computed and stored in a contiguous array before control returns to the caller. A generator expression is lazy: it stores only the iterator state (the code object, the current position, and local variables), and produces values one at a time when iterated.
import sys n = 1_000_000 lc = [x * 2 for x in range(n)] ge = (x * 2 for x in range(n)) print(sys.getsizeof(lc)) # → 8,448,728 (bytes; scales linearly with n) print(sys.getsizeof(ge)) # → 208 (constant regardless of n)
sys.getsizeof reports the size of the generator object itself — the 208-byte frame — not the values it will produce. Those values are never all in memory simultaneously.
When you create a generator expression, CPython compiles it to a code object and constructs a PyFrameObject (or the more compact PyInterpreterFrame in 3.11+). This frame holds:
import inspect g = (x ** 2 for x in range(5)) print(g.gi_code.co_name) # → '<genexpr>' print(g.gi_frame.f_lasti) # → -1 (not started yet) next(g) # → 0 print(g.gi_frame.f_lasti) # → some positive offset; frame is suspended here print(inspect.getgeneratorstate(g)) # → 'GEN_SUSPENDED'
A subtle but critical point: in a generator expression, the outermost for clause's iterable is evaluated immediately when the expression is created, not when iteration begins. Inner iterables are evaluated lazily.
data = [1, 2, 3] g = (x * 10 for x in data) # data bound here data.append(4) # mutating the list list(g) # → [10, 20, 30, 40] — mutation is visible # Compare: iterator protocol, not a snapshot data2 = [1, 2, 3] lc = [x * 10 for x in data2] data2.append(4) # lc is already [10, 20, 30] — snapshot taken at construction
A generator expression holds a reference to the iterable, not a copy. Mutating the source between creation and exhaustion affects what the generator yields. A list comprehension does not have this exposure because evaluation is complete before you can mutate anything.
A generator is a stateful iterator. Once exhausted it yields nothing — there is no rewind. A list can be iterated any number of times.
g = (x ** 2 for x in range(3)) list(g) # → [0, 1, 4] list(g) # → [] — exhausted lc = [x ** 2 for x in range(3)] list(lc) # → [0, 1, 4] list(lc) # → [0, 1, 4] — re-iterable
The right metric depends on what you're measuring. Construction time, peak memory, and per-element overhead each tell a different story.
import tracemalloc tracemalloc.start() # List approach: two full lists in memory simultaneously evens = [x for x in range(10_000_000) if x % 2 == 0] squares = [x ** 2 for x in evens] total_lc = sum(squares) peak_lc = tracemalloc.get_traced_memory()[1] tracemalloc.clear_traces() # Generator approach: no intermediates total_ge = sum(x ** 2 for x in range(10_000_000) if x % 2 == 0) peak_ge = tracemalloc.get_traced_memory()[1] tracemalloc.stop() print(peak_lc / 1e6) # ≈ 320 MB print(peak_ge / 1e6) # ≈ 0.01 MB
Generator expressions have per-next overhead: the interpreter must resume the frame, execute bytecode, and suspend again. If you iterate a collection multiple times, need random access, or pass the result to code that expects a sequence, a list comprehension is faster overall — paying the cost once up front.
import timeit setup = "data = range(1000)" # Build once, access twice lc_stmt = """ lc = [x**2 for x in data] s = sum(lc) + sum(lc) """ # Build twice (cannot reuse exhausted generator) ge_stmt = """ s = sum(x**2 for x in data) + sum(x**2 for x in data) """ print(timeit.timeit(lc_stmt, setup=setup, number=10_000)) # ≈ 0.45 s print(timeit.timeit(ge_stmt, setup=setup, number=10_000)) # ≈ 0.80 s
For single-pass aggregation (sum, max, any, all, join), generators often match or beat list comprehensions on speed while using a fraction of the memory. For anything that requires the collection to exist — sorting, slicing, indexing, len() — you need a list.
Both forms create their own scope since Python 3. The iteration variable does not leak into the enclosing scope.
x = "outer" lc = [x for x in range(3)] print(x) # → "outer" — x unchanged ge = (x for x in range(3)) print(x) # → "outer" — x unchanged
However, generator expressions close over the enclosing scope at creation time. This can produce surprising results in loops if you're not careful:
# Common mistake: generator closes over `i`, not its value gens = [(x + i for x in range(3)) for i in range(3)] print([list(g) for g in gens]) # → [[0,1,2], [1,2,3], [2,3,4]] (OK here; i captured from list comp scope) # More dangerous with actual deferred evaluation gens2 = [] for i in range(3): gens2.append((x + i for x in range(3))) print([list(g) for g in gens2]) # → [[2,3,4], [2,3,4], [2,3,4]] — all see i=2 at exhaustion time # Fix: capture i at creation time via default arg in inner lambda gens3 = [] for i in range(3): gens3.append((x + j for x in range(3) for j in [i])) # pin i print([list(g) for g in gens3]) # → [[0,1,2], [1,2,3], [2,3,4]]
Several builtins short-circuit. With a generator expression they stop iteration as soon as the answer is known, saving work. A list comprehension gives them no opportunity — evaluation is already complete.
def expensive(x): print(f"computing {x}") return x % 2 == 0 # List: computes all 1,000,000 before any() checks the first result = any([expensive(x) for x in range(1_000_000)]) # Generator: stops after the first True (x=0 here) result = any(expensive(x) for x in range(1_000_000)) # prints "computing 0" then stops
The same applies to all, next with a default, and itertools.islice. For min, max, and sum, there is no short-circuit — all elements are consumed regardless.
A generator object implements __iter__ and __next__ but not __len__, __getitem__, or __contains__. This matters when passing results to APIs that check for these:
g = (x for x in range(5)) len(g) # TypeError: object of type 'generator' has no len() g[2] # TypeError: 'generator' object is not subscriptable 3 in g # Works — but CONSUMES up to the found element reversed(g) # TypeError: 'generator' object is not reversible # If you need these, materialise to a list explicitly: result = list(x for x in range(5)) # equivalent to list comp
Using in on a generator performs a linear scan and advances the iterator. After a membership test, elements up to and including the found element are consumed and unrecoverable. Never use in on a generator you intend to iterate afterwards.
Both forms support multiple for and if clauses. The execution order follows left-to-right, outermost-first — identical to nested for loops.
# Equivalent to: # for row in matrix: # for val in row: # if val > 0: yield val matrix = [[1, -2, 3], [-4, 5, 6]] pos = [val for row in matrix for val in row if val > 0] # → [1, 3, 5, 6]
For multi-stage pipelines, chaining generator expressions avoids all intermediate allocations:
import csv def process_log(path): with open(path) as f: rows = csv.DictReader(f) # iterator errors = (r for r in rows if r["level"] == "ERROR") # filter msgs = (r["message"].strip() for r in errors) # transform unique = (m for m in msgs if m not in seen and not seen.add(m)) # dedup seen = set() return list(unique) # materialise only at the end
| Property | List comprehension [ ] |
Generator expression ( ) |
|---|---|---|
| Return type | list |
generator |
| Evaluation | Eager — all elements computed immediately | Lazy — computed on demand, one at a time |
| Memory | O(n) — holds all elements | O(1) — holds iterator state only |
| Re-iterable | ✓ Yes | ✗ No — exhausted after one pass |
| Indexing / slicing | ✓ Yes | ✗ No |
len() |
✓ Yes | ✗ No |
Short-circuit with any/all |
✗ No — fully evaluated before call | ✓ Yes |
| Outermost iterable bound | At construction (eager) | At construction (eager) — inner are lazy |
| Parenthesis elision | N/A | Only when sole function argument |
| Best for | Multiple passes, indexing, sorting, known small data | Large data, pipelines, single-pass aggregation |
| Object size (n=1 000 000) | ~8.4 MB | ~208 bytes |
Use a list comprehension when: the result must be indexed, sliced, or reversed; the collection will be iterated more than once; you need len(); or the dataset is small enough that materialization cost is negligible and clarity benefits from having a concrete list.
Use a generator expression when: you're doing a single-pass aggregation (sum, max, any, all); you're constructing a lazy pipeline and want to avoid intermediate lists; the source data is large or unbounded; or you want short-circuit evaluation with any/all.
When in doubt and the result is consumed immediately by a single builtin, prefer the generator expression. If the result needs to be passed around, inspected, or consumed more than once, materialise it to a list.