Understanding PostgreSQL Join Optimization: A Deep Dive
- Deepak Mahto

- 2 days ago
- 11 min read
Estimated read time: 15-18 minutes
The Challenge: Why Multi-Table Joins Matter at Scale
When PostgreSQL runs at scale, a few expensive queries can bring down your entire service. OpenAI experienced this with ChatGPT:
A single query joining 12 tables caused multiple high-severity production incidents.
Complex multi-table joins are a common scaling bottleneck they can spike planning time, consume excessive resources, and create unpredictable performance. Understanding how PostgreSQL's optimizer handles join order and knowing when to intervene isn't just about speed; it's about system stability and preventing outages.
Let's dive into how the optimizer actually works and how to handle queries with large table counts. But first, we need to understand the fundamentals of how PostgreSQL thinks about joins.
The Foundation: Why Join Order Matters
Imagine you're at a party and you need to find three people: Alice, Bob, and Charlie. You know Alice knows Bob, and Bob knows Charlie. Now, you could approach this several ways:
Find Alice first, ask her where Bob is, then ask Bob where Charlie is
Find Bob first, then track down both Alice and Charlie
Find Alice and Charlie separately, then somehow figure out if Bob connects them
Which sounds most efficient? Obviously the first one, right? Well, that's exactly the kind of decision PostgreSQL has to make every single time you write a query with multiple tables.
Let's look at a real SQL example:
SELECT *
FROM orders, customers, products
WHERE orders.customer_id = customers.id
AND orders.product_id = products.id;
To you and me, this looks simple. But to PostgreSQL, this is a puzzle with multiple solutions. It could:
Join orders with customers first, then add products
Join orders with products first, then add customers
Join customers with products first (terrible idea!), then somehow match with orders
Here's the thing that blows most people's minds:
these different approaches can produce vastly different performance. We're not talking about 10% faster or slower. We're talking about queries that run in milliseconds versus queries that might take minutes.
The Exponential Problem: Why More Tables = Big Trouble
Now, let me show you why PostgreSQL starts to struggle as you add more tables to your joins.
Look at how the number of possible join orders explodes:
Tables | Join Combinations |
3 | 5 |
4 | 15 |
5 | 52 |
6 | 203 |
10 | 115,975 |
12 | 4.2 million |
Each time you add one more table to your query, the number of possibilities doesn't just increase it explodes exponentially.
Think about what this means in practice:
Small queries (3-5 tables): PostgreSQL can check every possibility in milliseconds. No problem.
Medium queries (6-8 tables): Planning takes longer, but still manageable we're talking few more milliseconds at most.
Large queries (10+ tables): Now we're in trouble. Checking lot of possibilities could take few seconds on some hardware.
Very large queries (15+ tables): The numbers become so astronomical that exhaustive checking can be cumbersome.
This is exactly why OpenAI's 12-table query caused production incidents. At 12 tables, you're at million possibilities. PostgreSQL can't check them all, so it has to make optimization on join order evaluation And if those are wrong, your query performance becomes unpredictable sometimes fast, sometimes catastrophically slow.
Reference: The join enumeration logic is in PostgreSQL's source code at src/backend/optimizer/path/joinrels.c (https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/joinrels.c)
So what does PostgreSQL do about this explosion of possibilities?
How PostgreSQL handles It: From Exhaustive Search to GEQO
This is where it gets really interesting. PostgreSQL uses something called GEQO (Genetic Query Optimizer), and it's as cool as it sounds.
When you have a small number of tables (up to about 11 by default), PostgreSQL does what we call an "exhaustive search." It literally checks every possible join order, calculates the cost of each one, and picks the winner. It's like checking every route on Google Maps before your road trip.
But when you cross that threshold of 12 tables, PostgreSQL switches gears completely. It activates GEQO, which uses principles inspired by biological evolution. Instead of checking every possibility, it:
Generates several random join orders (like different "genetic" combinations)
Tests them and sees which ones perform better
Creates new "offspring" combinations by mixing the best performers
Repeats this process for a while
Picks the best one it found
It's not guaranteed to find THE absolute best plan, but it finds a pretty good one in a reasonable amount of time. Think of it like this: you might not find the single fastest route on your road trip, but you'll find a route that's way better than random, and you'll find it before you die of old age.
The GEQO Threshold Parameter
This magic number where PostgreSQL switches from exhaustive search to genetic optimization is controlled by a parameter called geqo_threshold. By default, it's set to 12.
Why 12? This default was set based on hardware from the early 2000s. At that point, with 12 tables you're looking at ~4 to 5 million possible join orders, and exhaustive search started becoming impractically slow. With modern hardware and improved planning algorithms, you can safely increase this threshold based on some internal testing.
You can check it like this:
SHOW geqo_threshold;If you're regularly dealing with complex queries and want exhaustive search to run longer before GEQO kicks in, you can increase this threshold:
SET geqo_threshold = 16; -- or higher based on testingHowever, a word of caution: While raising the threshold can help PostgreSQL find better plans, it's often a sign that you should review the query itself.
Remember the lesson from OpenAI: complex multi-table joins are inherently risky at scale. You understand your data and business logic better than any optimizer sometimes the best solution is restructuring your query, breaking it into smaller parts, or moving join logic to the application layer rather than asking PostgreSQL to solve an increasingly impossible optimization problem.
Rule of thumb: If you're hitting GEQO frequently or considering raising geqo_threshold above 18-20, take a step back and ask: "Should this query be structured differently?"
Taking Control: The Collapse Limit Parameters
Here's where things get practical. You can actually help PostgreSQL out by being explicit about your join order and controlling how it plans your queries.
Understanding Query Flattening
Before we dive into the parameters, you need to understand what "collapsing" or "flattening" means.
When you write explicit JOINs or use subqueries, PostgreSQL has a choice:
Flatten/merge them into one big planning problem (more freedom, better plans, slower planning)
Keep them separate and respect your structure (less freedom, faster planning, potentially suboptimal)
PostgreSQL uses two parameters to control this behavior, and they work identically but apply to different parts of your query.
join_collapse_limit - Controls explicit JOIN syntax from_collapse_limit - Controls subqueries and views
Both default to 8, and both do exactly the same thing:
If your query has ≤ 8 items:
PostgreSQL ignores your structure completely
It flattens everything as if you wrote: FROM a, b, c, d, e, f, g, h
It has full freedom to explore all possible join orders
Your explicit JOIN syntax or subqueries are just syntax—they don't constrain anything
If your query has > 8 items:
PostgreSQL respects the structure you created
It does NOT flatten everything into one planning problem
It creates an exhaustive plan within the constraints of your structure
Your JOIN groupings or subquery boundaries matter and reduce the search space
Why 8 specifically? This was set based on older hardware. With 8 tables, you have many possible join orders but still manageable with exhaustive search. The default of 8 is set lower than geqo_threshold (12) to create a safety buffer and give you some control before queries get too complex.
Critical insight: The collapse limit is about whether to merge/flatten your query structure, not about which algorithm to use. Even with 10 explicit JOINs (above the limit of 8), PostgreSQL still uses exhaustive search as long as you're below geqo_threshold (12).
Examples: How Collapse Limits Work
Let's look at three logically identical queries:
-- Query 1: Old-school implicit joins
SELECT *
FROM a, b, c
WHERE a.id = b.id AND b.ref = c.id;
-- Query 2: Explicit CROSS JOINs (but still free-for-all)
SELECT *
FROM a
CROSS JOIN b
CROSS JOIN c
WHERE a.id = b.id AND b.ref = c.id;
-- Query 3: Explicit JOIN with structure
SELECT *
FROM a
JOIN (b JOIN c ON (b.ref = c.id))
ON (a.id = b.id);
With default settings (collapse_limit=8):
All three queries are treated identically
PostgreSQL flattens Query 3 to look like Query 1
Full freedom to reorder
With collapse_limit=1:
Query 1 and 2 still have full freedom
Query 3's structure is respected - PostgreSQL will join b to c first, then add a
Your explicit JOIN order is enforced
Important distinction: The collapse limits ONLY apply to explicit JOIN syntax and subqueries. If you write comma-separated FROM lists like FROM a, b, c, d, e, f, g, h, i, j, the join_collapse_limit doesn't apply at all PostgreSQL always has full freedom to reorder those tables (unless GEQO kicks in).
The Subquery Case
Let's see how from_collapse_limit works with subqueries:
CREATE VIEW sales_summary AS
SELECT *
FROM orders, products, customers
WHERE orders.product_id = products.id
AND orders.customer_id = customers.id;
Now you use it in a query:
SELECT *
FROM regions, stores, sales_summary
WHERE regions.id = stores.region_id
AND stores.id = sales_summary.store_id;
What do you think happens? PostgreSQL doesn't actually execute the view separately. Instead, it "flattens" everything into one big query:
SELECT *
FROM regions, stores, orders, products, customers
WHERE regions.id = stores.region_id
AND stores.id = orders.store_id
AND orders.product_id = products.id
AND orders.customer_id = customers.id;
With from_collapse_limit=8:
Total would be 5 tables (< 8)
PostgreSQL DOES flatten the view
Plans as one big problem
Better overall plan but slower planning
With from_collapse_limit=4:
Total would be 5 tables (> 4)
PostgreSQL keeps the view separate
Plans the view independently, then joins to regions and stores
Faster planning but potentially suboptimal execution plans.
The Golden Rule
Thumb rule: Always set from_collapse_limit and join_collapse_limit to the same value. This ensures consistent behavior between explicit JOINs and subqueries your query structure will be treated predictably regardless of how you write it.
Seeing It In Action: Explain Plan Examples
Let's see this in action with a real example. Notice how the plan changes when we adjust collapse limits.
Initial setup - check current settings:
postgres=# \dconfig *collapse*
List of configuration parameters
┌─────────────────────┬───────┐
│ Parameter │ Value │
├─────────────────────┼───────┤
│ from_collapse_limit │ 8 │
│ join_collapse_limit │ 8 │
└─────────────────────┴───────┘Query with 12 tables split into subqueries and join:
EXPLAIN SELECT base.id
FROM (
-- Subquery 1: First 6 tables
SELECT t1.id FROM t1
JOIN t2 USING(id) JOIN t3 USING(id) JOIN t4 USING(id)
JOIN t5 USING(id) JOIN t6 USING(id)
) base
JOIN (
-- Subquery 2: Next 6 tables
SELECT t7.id FROM t7
JOIN t8 USING(id) JOIN t9 USING(id) JOIN t10 USING(id)
JOIN t11 USING(id) JOIN t12 USING(id)
) extension ON base.id = extension.id;
Result with Default Settings (collapse=8, geqo=12)
Analysis:
Total tables: 12
Each subquery has 6 tables (< 8), so they get flattened internally
But combining both subqueries would create 12 tables (> 8)
PostgreSQL keeps the two subqueries separate
GEQO might kick in depending on how it evaluates
Result: Joins are NOT flattened across subqueries - faster planning, potentially suboptimal execution
Now Increase Collapse Limits
SET from_collapse_limit = 16;
SET join_collapse_limit = 16;
postgres=# \dconfig *collapse*
List of configuration parameters
┌─────────────────────┬───────┐
│ Parameter │ Value │
├─────────────────────┼───────┤
│ from_collapse_limit │ 16 │
│ join_collapse_limit │ 16 │
└─────────────────────┴───────┘Run the same EXPLAIN:

Result with Higher Settings (collapse=16, geqo=12)
Analysis:
12 < from_collapse_limit(16), so PostgreSQL DOES flatten both subqueries
Plans as one problem with all 12 tables flattened
Since 12 = geqo_threshold, GEQO kicks in
Result: Joins are flattened - slower planning, better overall plan (sees the whole picture)
Key Insight from This Example
Collapse limits control whether PostgreSQL merges/flattens different parts of your query:
Below limit: Merge everything → full freedom → more planning time → better plans
Above limit: Keep structure separate → constrained search → less planning time → potentially suboptimal
GEQO threshold controls which algorithm is used:
Below threshold: Exhaustive search → optimal plan (within constraints)
At/above threshold: Genetic algorithm → good-enough plan
Modern Approach: Recommendations and Strategies
Understanding the Historical Context
The current defaults were set based on hardware from the early 2000s:
In 2003, planning 8+ tables took noticeably long
GEQO at that time was "really bad" according to PostgreSQL core developers
They wanted to avoid triggering GEQO unnecessarily
Conservative values were safer for average workloads
Historical context from PostgreSQL mailing lists (2009): Core developers discussed raising these values but wanted evidence. Production users reported successfully running with collapse_limit = 20 and GEQO disabled entirely without issues.
The defaults haven't changed because they're still safe for all workloads, but modern systems can handle much higher values.
Modern Hardware Recommendations
With modern hardware, we can upgrade from default settings to higher values.
One detailed analysis explored how with PostgreSQL Enterprise Plus, we can handle much higher values than the old defaults.
Recommended settings for modern systems (based on internal testing):
sql
SET geqo_threshold = 20-24; -- vs default 12
SET join_collapse_limit = 16-18; -- vs default 8
SET from_collapse_limit = 16-18; -- vs default 8Source: Medium article "How join_collapse_limit and geqo_threshold in Cloud SQL for PostgreSQL Enterprise Plus can help to optimize complex queries" (https://medium.com/google-cloud/how-join-collapse-limit-and-geqo-threshold-can-help-to-optimize-complex-queries-fc7be9fec5c3)
Critical rule: ALWAYS keep join_collapse_limit < geqo_threshold. If you set collapse_limit ≥ geqo_threshold, you might accidentally trigger GEQO when you didn't want it, resulting in suboptimal plans.
Why these higher values work now:
PostgreSQL's planning algorithms have improved significantly since the early 2000s
Modern CPUs are much faster
Memory context handling is more efficient
GEQO itself has gotten much better
Strategy 1: Balanced Modern Approach
If you have modern hardware and want to go beyond default settings:
SET join_collapse_limit = 12;
SET from_collapse_limit = 12;
SET geqo_threshold = 18;This gives you:
Better utilization of modern hardware
More flexibility for the optimizer with medium-sized queries
Still avoids GEQO until you hit 18 tables
Predictable behavior (explicit JOINs and subqueries treated similarly)
Strategy 2: Full Manual Control
Force strict join order respect:
SET join_collapse_limit = 1;
SET from_collapse_limit = 1;Now you're in full control. PostgreSQL will:
Execute your JOINs in exactly the order you write them
Never flatten subqueries
This is great when:
You've tested different join orders and found the winner
Planning time is a bigger problem than execution time
You need consistent, predictable plans
But remember: With great power comes great responsibility. You're now the optimizer. Don't mess it up. Use it for a session or query-specific cases if possible—don't make it the default in your database.
Going back to the OpenAI example: Their 12-table query was hitting the GEQO threshold. With join_collapse_limit=1, they could have locked in a tested join order, making planning instant and execution predictable potentially avoiding those high-severity incidents.
Strategy 3: Long-Running Analytics
For complex analytical queries that run overnight or in batch processes:
SET geqo_threshold = 24; -- Use exhaustive search even longer
SET join_collapse_limit = 18; -- More flexibility
SET from_collapse_limit = 18;You're telling PostgreSQL: "Take your time, find the best plan. I can wait. This query runs once a day and takes 10 minutes anyway, so spending extra seconds on planning is fine."
Putting This Into Practice: Monitoring Your Queries
Now that you understand the parameters, how do you know when to use them?
Monitoring in Production
In production, keep an eye on queries to find slow planners:
SELECT query,
total_exec_time,
mean_exec_time,
calls
FROM pg_stat_statements
WHERE query LIKE '%JOIN%'
ORDER BY mean_exec_time DESC
LIMIT 20;If you see queries where planning time dominates execution time, those are candidates for join order control.
The Bottom Line: Your Action Plan
Here's your practical roadmap based on query complexity:
1. For most queries (< 8 tables):
Let PostgreSQL do its thing
The defaults work great
No intervention needed
2. For medium queries (8-12 tables):
Monitor planning time with EXPLAIN ANALYZE
If planning is slow, consider raising collapse limits to 12-16
Test thoroughly before changing defaults
3. For large queries (12+ tables):
First, ask: Should this be restructured? Can I break it into smaller queries?
If restructuring isn't possible:
Test explicit JOIN order with different arrangements
Use join_collapse_limit=1 to lock in the best order you find
Use EXPLAIN ANALYZE religiously to verify improvements
Remember: You're taking control away from the optimizer make sure you know what you're doing
4. For modern hardware (general recommendation):
Start with: collapse_limit=12, geqo_threshold=18
Adjust based on your specific workload
Always keep collapse_limit < geqo_threshold
Monitor query performance after changes
5. Critical checks:
Run EXPLAIN ANALYZE to find planning vs execution time breakdown
Keep statistics updated with ANALYZE on your tables
Document any forced join orders with comments explaining why
Review ORMs and generated SQL they often create inefficient multi-table joins
Remember: PostgreSQL's optimizer is incredibly sophisticated. Most of the time, you should let it do its job. But when you're joining 12+ tables and planning takes longer than execution, or when you're experiencing unpredictable performance like OpenAI did, it's time to step in and help guide the way.
Now go forth and optimize! And may your queries be ever swift and your join orders ever wise.





Comments