top of page

Understanding PostgreSQL Join Optimization: A Deep Dive

Estimated read time: 15-18 minutes


The Challenge: Why Multi-Table Joins Matter at Scale


When PostgreSQL runs at scale, a few expensive queries can bring down your entire service. OpenAI experienced this with ChatGPT:

A single query joining 12 tables caused multiple high-severity production incidents.

Complex multi-table joins are a common scaling bottleneck they can spike planning time, consume excessive resources, and create unpredictable performance. Understanding how PostgreSQL's optimizer handles join order and knowing when to intervene isn't just about speed; it's about system stability and preventing outages.


Let's dive into how the optimizer actually works and how to handle queries with large table counts. But first, we need to understand the fundamentals of how PostgreSQL thinks about joins.


The Foundation: Why Join Order Matters


Imagine you're at a party and you need to find three people: Alice, Bob, and Charlie. You know Alice knows Bob, and Bob knows Charlie. Now, you could approach this several ways:

  1. Find Alice first, ask her where Bob is, then ask Bob where Charlie is

  2. Find Bob first, then track down both Alice and Charlie

  3. Find Alice and Charlie separately, then somehow figure out if Bob connects them


Which sounds most efficient? Obviously the first one, right? Well, that's exactly the kind of decision PostgreSQL has to make every single time you write a query with multiple tables.


Let's look at a real SQL example:

SELECT * 
FROM orders, customers, products 
WHERE orders.customer_id = customers.id 
  AND orders.product_id = products.id;

To you and me, this looks simple. But to PostgreSQL, this is a puzzle with multiple solutions. It could:

  • Join orders with customers first, then add products

  • Join orders with products first, then add customers

  • Join customers with products first (terrible idea!), then somehow match with orders

Here's the thing that blows most people's minds: these different approaches can produce vastly different performance. We're not talking about 10% faster or slower. We're talking about queries that run in milliseconds versus queries that might take minutes.

The Exponential Problem: Why More Tables = Big Trouble

Now, let me show you why PostgreSQL starts to struggle as you add more tables to your joins.

Look at how the number of possible join orders explodes:

Tables

Join Combinations

3

5

4

15

5

52

6

203

10

115,975

12

4.2 million

Each time you add one more table to your query, the number of possibilities doesn't just increase it explodes exponentially.


Think about what this means in practice:

  • Small queries (3-5 tables): PostgreSQL can check every possibility in milliseconds. No problem.

  • Medium queries (6-8 tables): Planning takes longer, but still manageable we're talking few more milliseconds at most.

  • Large queries (10+ tables): Now we're in trouble. Checking lot of possibilities could take few seconds on some hardware.

  • Very large queries (15+ tables): The numbers become so astronomical that exhaustive checking can be cumbersome.


This is exactly why OpenAI's 12-table query caused production incidents. At 12 tables, you're at million possibilities. PostgreSQL can't check them all, so it has to make optimization on join order evaluation And if those are wrong, your query performance becomes unpredictable sometimes fast, sometimes catastrophically slow.


Reference: The join enumeration logic is in PostgreSQL's source code at src/backend/optimizer/path/joinrels.c (https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/joinrels.c)


So what does PostgreSQL do about this explosion of possibilities?


How PostgreSQL handles It: From Exhaustive Search to GEQO


This is where it gets really interesting. PostgreSQL uses something called GEQO (Genetic Query Optimizer), and it's as cool as it sounds.


When you have a small number of tables (up to about 11 by default), PostgreSQL does what we call an "exhaustive search." It literally checks every possible join order, calculates the cost of each one, and picks the winner. It's like checking every route on Google Maps before your road trip.

But when you cross that threshold of 12 tables, PostgreSQL switches gears completely. It activates GEQO, which uses principles inspired by biological evolution. Instead of checking every possibility, it:

  1. Generates several random join orders (like different "genetic" combinations)

  2. Tests them and sees which ones perform better

  3. Creates new "offspring" combinations by mixing the best performers

  4. Repeats this process for a while

  5. Picks the best one it found

It's not guaranteed to find THE absolute best plan, but it finds a pretty good one in a reasonable amount of time. Think of it like this: you might not find the single fastest route on your road trip, but you'll find a route that's way better than random, and you'll find it before you die of old age.


The GEQO Threshold Parameter

This magic number where PostgreSQL switches from exhaustive search to genetic optimization is controlled by a parameter called geqo_threshold. By default, it's set to 12.


Why 12? This default was set based on hardware from the early 2000s. At that point, with 12 tables you're looking at ~4 to 5 million possible join orders, and exhaustive search started becoming impractically slow. With modern hardware and improved planning algorithms, you can safely increase this threshold based on some internal testing.


You can check it like this:

SHOW geqo_threshold;

If you're regularly dealing with complex queries and want exhaustive search to run longer before GEQO kicks in, you can increase this threshold:

SET geqo_threshold = 16;  -- or higher based on testing

However, a word of caution: While raising the threshold can help PostgreSQL find better plans, it's often a sign that you should review the query itself.


Remember the lesson from OpenAI: complex multi-table joins are inherently risky at scale. You understand your data and business logic better than any optimizer sometimes the best solution is restructuring your query, breaking it into smaller parts, or moving join logic to the application layer rather than asking PostgreSQL to solve an increasingly impossible optimization problem.


Rule of thumb: If you're hitting GEQO frequently or considering raising geqo_threshold above 18-20, take a step back and ask: "Should this query be structured differently?"


Taking Control: The Collapse Limit Parameters


Here's where things get practical. You can actually help PostgreSQL out by being explicit about your join order and controlling how it plans your queries.

Understanding Query Flattening

Before we dive into the parameters, you need to understand what "collapsing" or "flattening" means.

When you write explicit JOINs or use subqueries, PostgreSQL has a choice:

  • Flatten/merge them into one big planning problem (more freedom, better plans, slower planning)

  • Keep them separate and respect your structure (less freedom, faster planning, potentially suboptimal)


PostgreSQL uses two parameters to control this behavior, and they work identically but apply to different parts of your query.


join_collapse_limit - Controls explicit JOIN syntax from_collapse_limit - Controls subqueries and views

Both default to 8, and both do exactly the same thing:


If your query has ≤ 8 items:

  • PostgreSQL ignores your structure completely

  • It flattens everything as if you wrote: FROM a, b, c, d, e, f, g, h

  • It has full freedom to explore all possible join orders

  • Your explicit JOIN syntax or subqueries are just syntax—they don't constrain anything


If your query has > 8 items:

  • PostgreSQL respects the structure you created

  • It does NOT flatten everything into one planning problem

  • It creates an exhaustive plan within the constraints of your structure

  • Your JOIN groupings or subquery boundaries matter and reduce the search space


Why 8 specifically? This was set based on older hardware. With 8 tables, you have many possible join orders but still manageable with exhaustive search. The default of 8 is set lower than geqo_threshold (12) to create a safety buffer and give you some control before queries get too complex.


Critical insight: The collapse limit is about whether to merge/flatten your query structure, not about which algorithm to use. Even with 10 explicit JOINs (above the limit of 8), PostgreSQL still uses exhaustive search as long as you're below geqo_threshold (12).


Examples: How Collapse Limits Work

Let's look at three logically identical queries:


-- Query 1: Old-school implicit joins
SELECT * 
FROM a, b, c 
WHERE a.id = b.id AND b.ref = c.id;

-- Query 2: Explicit CROSS JOINs (but still free-for-all)
SELECT * 
FROM a 
CROSS JOIN b 
CROSS JOIN c 
WHERE a.id = b.id AND b.ref = c.id;

-- Query 3: Explicit JOIN with structure
SELECT * 
FROM a 
JOIN (b JOIN c ON (b.ref = c.id)) 
  ON (a.id = b.id);

With default settings (collapse_limit=8):

  • All three queries are treated identically

  • PostgreSQL flattens Query 3 to look like Query 1

  • Full freedom to reorder


With collapse_limit=1:

  • Query 1 and 2 still have full freedom

  • Query 3's structure is respected - PostgreSQL will join b to c first, then add a

  • Your explicit JOIN order is enforced


Important distinction: The collapse limits ONLY apply to explicit JOIN syntax and subqueries. If you write comma-separated FROM lists like FROM a, b, c, d, e, f, g, h, i, j, the join_collapse_limit doesn't apply at all PostgreSQL always has full freedom to reorder those tables (unless GEQO kicks in).


The Subquery Case

Let's see how from_collapse_limit works with subqueries:

CREATE VIEW sales_summary AS
SELECT * 
FROM orders, products, customers
WHERE orders.product_id = products.id 
  AND orders.customer_id = customers.id;

Now you use it in a query:

SELECT * 
FROM regions, stores, sales_summary
WHERE regions.id = stores.region_id 
  AND stores.id = sales_summary.store_id;

What do you think happens? PostgreSQL doesn't actually execute the view separately. Instead, it "flattens" everything into one big query:

SELECT * 
FROM regions, stores, orders, products, customers
WHERE regions.id = stores.region_id 
  AND stores.id = orders.store_id
  AND orders.product_id = products.id 
  AND orders.customer_id = customers.id;

With from_collapse_limit=8:

  • Total would be 5 tables (< 8)

  • PostgreSQL DOES flatten the view

  • Plans as one big problem

  • Better overall plan but slower planning


With from_collapse_limit=4:

  • Total would be 5 tables (> 4)

  • PostgreSQL keeps the view separate

  • Plans the view independently, then joins to regions and stores

  • Faster planning but potentially suboptimal execution plans.


The Golden Rule

Thumb rule: Always set from_collapse_limit and join_collapse_limit to the same value. This ensures consistent behavior between explicit JOINs and subqueries your query structure will be treated predictably regardless of how you write it.

Seeing It In Action: Explain Plan Examples


Let's see this in action with a real example. Notice how the plan changes when we adjust collapse limits.

Initial setup - check current settings:

postgres=# \dconfig *collapse*
List of configuration parameters
┌─────────────────────┬───────┐
│      Parameter      │ Value │
├─────────────────────┼───────┤
│ from_collapse_limit │ 8     │
│ join_collapse_limit │ 8     │
└─────────────────────┴───────┘

Query with 12 tables split into subqueries and join:

EXPLAIN SELECT base.id
FROM (
  -- Subquery 1: First 6 tables
  SELECT t1.id FROM t1
  JOIN t2 USING(id) JOIN t3 USING(id) JOIN t4 USING(id)
  JOIN t5 USING(id) JOIN t6 USING(id)
) base
JOIN (
  -- Subquery 2: Next 6 tables
  SELECT t7.id FROM t7
  JOIN t8 USING(id) JOIN t9 USING(id) JOIN t10 USING(id)
  JOIN t11 USING(id) JOIN t12 USING(id)
) extension ON base.id = extension.id;

Database query plan diagram with hash joins and costs. Highlighted text in orange reads: "Joins are not flatten." Rows are listed below.
Without collapse kicking in.

Result with Default Settings (collapse=8, geqo=12)

Analysis:

  • Total tables: 12

  • Each subquery has 6 tables (< 8), so they get flattened internally

  • But combining both subqueries would create 12 tables (> 8)

  • PostgreSQL keeps the two subqueries separate

  • GEQO might kick in depending on how it evaluates

  • Result: Joins are NOT flattened across subqueries - faster planning, potentially suboptimal execution


Now Increase Collapse Limits

SET from_collapse_limit = 16;
SET join_collapse_limit = 16;

postgres=# \dconfig *collapse*
List of configuration parameters
┌─────────────────────┬───────┐
│      Parameter      │ Value │
├─────────────────────┼───────┤
│ from_collapse_limit │ 16    │
│ join_collapse_limit │ 16    │
└─────────────────────┴───────┘

Run the same EXPLAIN:

Query plan diagram showing database operations using hash joins. Text and costs are outlined. Title reads "Flatten Joins." Background is white.
With Flatten Join

Result with Higher Settings (collapse=16, geqo=12)

Analysis:

  • 12 < from_collapse_limit(16), so PostgreSQL DOES flatten both subqueries

  • Plans as one problem with all 12 tables flattened

  • Since 12 = geqo_threshold, GEQO kicks in

  • Result: Joins are flattened - slower planning, better overall plan (sees the whole picture)


Key Insight from This Example

Collapse limits control whether PostgreSQL merges/flattens different parts of your query:

  • Below limit: Merge everything → full freedom → more planning time → better plans

  • Above limit: Keep structure separate → constrained search → less planning time → potentially suboptimal


GEQO threshold controls which algorithm is used:

  • Below threshold: Exhaustive search → optimal plan (within constraints)

  • At/above threshold: Genetic algorithm → good-enough plan


Modern Approach: Recommendations and Strategies


Understanding the Historical Context


The current defaults were set based on hardware from the early 2000s:

  • In 2003, planning 8+ tables took noticeably long

  • GEQO at that time was "really bad" according to PostgreSQL core developers

  • They wanted to avoid triggering GEQO unnecessarily

  • Conservative values were safer for average workloads

Historical context from PostgreSQL mailing lists (2009): Core developers discussed raising these values but wanted evidence. Production users reported successfully running with collapse_limit = 20 and GEQO disabled entirely without issues.

The defaults haven't changed because they're still safe for all workloads, but modern systems can handle much higher values.


Modern Hardware Recommendations


With modern hardware, we can upgrade from default settings to higher values.

One detailed analysis explored how with PostgreSQL Enterprise Plus, we can handle much higher values than the old defaults.

Recommended settings for modern systems (based on internal testing):

sql

SET geqo_threshold = 20-24;       -- vs default 12
SET join_collapse_limit = 16-18;  -- vs default 8
SET from_collapse_limit = 16-18;  -- vs default 8

Source: Medium article "How join_collapse_limit and geqo_threshold in Cloud SQL for PostgreSQL Enterprise Plus can help to optimize complex queries" (https://medium.com/google-cloud/how-join-collapse-limit-and-geqo-threshold-can-help-to-optimize-complex-queries-fc7be9fec5c3)


Critical rule: ALWAYS keep join_collapse_limit < geqo_threshold. If you set collapse_limit ≥ geqo_threshold, you might accidentally trigger GEQO when you didn't want it, resulting in suboptimal plans.


Why these higher values work now:

  • PostgreSQL's planning algorithms have improved significantly since the early 2000s

  • Modern CPUs are much faster

  • Memory context handling is more efficient

  • GEQO itself has gotten much better


Strategy 1: Balanced Modern Approach

If you have modern hardware and want to go beyond default settings:

SET join_collapse_limit = 12;
SET from_collapse_limit = 12;
SET geqo_threshold = 18;

This gives you:

  • Better utilization of modern hardware

  • More flexibility for the optimizer with medium-sized queries

  • Still avoids GEQO until you hit 18 tables

  • Predictable behavior (explicit JOINs and subqueries treated similarly)


Strategy 2: Full Manual Control

Force strict join order respect:

SET join_collapse_limit = 1;
SET from_collapse_limit = 1;

Now you're in full control. PostgreSQL will:

  • Execute your JOINs in exactly the order you write them

  • Never flatten subqueries

This is great when:

  • You've tested different join orders and found the winner

  • Planning time is a bigger problem than execution time

  • You need consistent, predictable plans

But remember: With great power comes great responsibility. You're now the optimizer. Don't mess it up. Use it for a session or query-specific cases if possible—don't make it the default in your database.


Going back to the OpenAI example: Their 12-table query was hitting the GEQO threshold. With join_collapse_limit=1, they could have locked in a tested join order, making planning instant and execution predictable potentially avoiding those high-severity incidents.


Strategy 3: Long-Running Analytics

For complex analytical queries that run overnight or in batch processes:

SET geqo_threshold = 24;            -- Use exhaustive search even longer
SET join_collapse_limit = 18;       -- More flexibility
SET from_collapse_limit = 18;

You're telling PostgreSQL: "Take your time, find the best plan. I can wait. This query runs once a day and takes 10 minutes anyway, so spending extra seconds on planning is fine."

Putting This Into Practice: Monitoring Your Queries

Now that you understand the parameters, how do you know when to use them?


Monitoring in Production

In production, keep an eye on queries to find slow planners:

SELECT query, 
       total_exec_time, 
       mean_exec_time,
       calls
FROM pg_stat_statements
WHERE query LIKE '%JOIN%'
ORDER BY mean_exec_time DESC
LIMIT 20;

If you see queries where planning time dominates execution time, those are candidates for join order control.


The Bottom Line: Your Action Plan

Here's your practical roadmap based on query complexity:


1. For most queries (< 8 tables):

  • Let PostgreSQL do its thing

  • The defaults work great

  • No intervention needed


2. For medium queries (8-12 tables):

  • Monitor planning time with EXPLAIN ANALYZE

  • If planning is slow, consider raising collapse limits to 12-16

  • Test thoroughly before changing defaults


3. For large queries (12+ tables):

  • First, ask: Should this be restructured? Can I break it into smaller queries?

  • If restructuring isn't possible:

    • Test explicit JOIN order with different arrangements

    • Use join_collapse_limit=1 to lock in the best order you find

    • Use EXPLAIN ANALYZE religiously to verify improvements

  • Remember: You're taking control away from the optimizer make sure you know what you're doing


4. For modern hardware (general recommendation):

  • Start with: collapse_limit=12, geqo_threshold=18

  • Adjust based on your specific workload

  • Always keep collapse_limit < geqo_threshold

  • Monitor query performance after changes


5. Critical checks:

  • Run EXPLAIN ANALYZE to find planning vs execution time breakdown

  • Keep statistics updated with ANALYZE on your tables

  • Document any forced join orders with comments explaining why

  • Review ORMs and generated SQL they often create inefficient multi-table joins


Remember: PostgreSQL's optimizer is incredibly sophisticated. Most of the time, you should let it do its job. But when you're joining 12+ tables and planning takes longer than execution, or when you're experiencing unpredictable performance like OpenAI did, it's time to step in and help guide the way.

Now go forth and optimize! And may your queries be ever swift and your join orders ever wise.

Flowchart titled "Join Parameter Comparison Chart" with boxes in purple, green, and blue showing different table settings. Arrows lead to a decision-making section.

Comments


bottom of page