PostgreSQL: Query Optimization for Mere Humans | by Eyal Trabelsi | Dec, 2024


We will use it as an example of a simple query: we want to count the number of users that don’t have Twitter handles.

EXPLAIN ANALYZE
SELECT COUNT(*) FROM users WHERE twitter != '';
We can see the execution plan returned from the EXPLAIN ANALYZE clause

It looks cryptic at first, and It’s even longer than our query, and that on a small example of real-world execution plans can be overwhelming if you don’t focus 😭.

But it does provide useful information. We can see that the query execution took 1.27 seconds, while the query planning took only 0.4 milli-seconds (negligible time).

We can see the time the query planning and execution took

The execution plan is structured as an inverse tree. In the next figure, you can see the execution plan is divided into different nodes each one of which represents a different operation whether it’s an Aggregation or a Scan.

We can see the time the query planning and execution took

There are many kinds of nodes operations, from Scan related (‘Seq Scan’, ‘Index Only Scan’, etc…), Join related( ‘Hash Join’, ’Nested Loop’, etc…), Aggregation related (‘GroupAggregate’, ’Aggregate’, etc…) and others ( ‘Limit’, ‘Sort’, ‘materialize’, etc..). Fortunately you need to remember any of this.

Pro Tip #3 💃: Focus is key, look only on nodes that are problematic.

Pro Tip #4 💃: Cheat ! on the problematic nodes search what they mean in the explain glossary.

Now, let’s drill down into how we know which node is the problematic one.

There is a lot of information we can see on each node

Let’s drill down to what those metrics actually mean.

  • Actual Loops: the number of loops the same node executed is 1. To get the total time and rows, the actual time and rows need to be multiplied by loops values.
  • Actual Rows: the actual number of produced rows of the Aggregate node is 1 (per-loop average and we have loops is 1).
  • Plan Rows: the estimated number of produced rows of the Aggregate node is 1. The estimated number of rows can be off depending on statistics.
  • Actual Startup Time: the time it took to return the first row in milliseconds of the Aggregate node is 1271.157 (aggregated and includes previous operations).
  • Startup Cost: arbitrary units that represent the estimated time to return the first row of the Aggregate node is 845110(aggregated and includes previous operations).
  • Actual Total Time: the time it took to return all the rows in ms of the Aggregate node is 1271.158 (per-loop average and we have loops is 1 and aggregated and include previous operations).
  • Total Cost: arbitrary units that represent the estimated time to return all the rows of Aggregate node is 845110 (aggregated).
  • Plan Width: the estimated average size of rows of the Aggregate node is 8 bytes.

Pro Tip #5 💃: be wary of loops, remember to multiply loops when you care about Actual Rows and Actual Total Time.

We will drill in the next section on a practical example.

Source link

#PostgreSQL #Query #Optimization #Mere #Humans #Eyal #Trabelsi #Dec