Are SQL Database Joins So Bad?

Author: Szymon Lipiński
Published at: 2009-11-12
Categories:

Last time on many forums there were endless discussions about database joins that they are very bad, very unefficient and terrible and the best is to avoid them. Generally that’s not true and I’ll try to show this here (taking some small 10 million records tables as on one forum was suggested).

First let’s create a simple table:

create table t1 (
    id serial primary key,
    i integer not null,
    t text not null
);
now let's fill this having 10 million records: 10 times run this:
insert into t1 (i,t)
select generate_series(1,1000000), '';
And now (just to have this table bigger as some state that joins on small tables are fast, on big tables are bad):
update t1 set t = i::text;
And now let's add another table:
create table t2 (
    id serial primary key,
    t1_id integer not null references t1
);
and filling this:
insert into t2(t1_id) select id from t1;
Now the tables look like this:
mtest=> \d t1
                         Table "public.t1"
 Column |  Type   |                    Modifiers
--------+---------+-------------------------------------------------
 id     | integer | not null
                    default nextval('t1_id_seq'::regclass)
 i      | integer | not null
 t      | text    | not null
Indexes:
    "t1_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "t2" CONSTRAINT "t2_t1_id_fkey" FOREIGN KEY (t1_id) REFERENCES t1(id)

mtest=> \d t2
                         Table "public.t2"
 Column |  Type   |                    Modifiers
--------+---------+-------------------------------------------------
 id     | integer | not null
                    default nextval('t2_id_seq'::regclass)
 t1_id  | integer | not null
Indexes:
    "t2_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "t2_t1_id_fkey" FOREIGN KEY (t1_id) REFERENCES t1(id)
How many records are there?
mtest=> select count(*) from t1;

  count
----------
 10000000
(1 row)

Time: 17340,848 ms

mtest=> select count(*) from t2;

  count
----------
 10000000
(1 row)

Time: 3522,489 ms
Let's check the table size:
mtest=>
  select pg_size_pretty(pg_relation_size('t1'));

 pg_size_pretty
----------------
 1204 MB
(1 row)

mtest=>
  select pg_size_pretty(pg_relation_size('t2'));

 pg_size_pretty
----------------
 346 MB
(1 row)
But tables with indices are of course a little bit bigger::
mtest=>
  select pg_size_pretty(pg_total_relation_size('t1'));

pg_size_pretty
----------------
1847 MB
(1 row)

mtest=>
  select pg_size_pretty(pg_total_relation_size('t2'));

pg_size_pretty
----------------
560 MB
(1 row)
Let's check the execution plan, both tables have 10 million of records:
explain
  select *
  from t1
  left join t2 on t1.id = t2.t1_id
  order by t1.id desc
  limit 100;


                                                QUERY PLAN
    ---------------------------------------------------------------------------------------------------
     Limit  (cost=1443640.26..1443645.31 rows=100 width=17)
       ->  Merge Left Join  (cost=1443640.26..2558613.62 rows=22085541 width=17)
             Merge Cond: (t1.id = t2.t1_id)
             ->  Index Scan Backward using t1_pkey on t1  (cost=0.00..814767.66 rows=22085541 width=9)
             ->  Materialize  (cost=1443640.26..1568639.98 rows=9999977 width=8)
                   ->  Sort  (cost=1443640.26..1468640.21 rows=9999977 width=8)
                         Sort Key: t2.t1_id
                         ->  Seq Scan on t2  (cost=0.00..144247.77 rows=9999977 width=8)
    (8 rows)
As you can see there is that terrible sequential scan on the t2 table. That's not good. Let's see how long it takes to run this query:
explain analyze
  select *
  from t1
  left join t2 on t1.id = t2.t1_id
  order by t1.id desc
  limit 100;


                                                                      QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=1443640.26..1443645.31 rows=100 width=17) (actual time=34728.242..34729.371 rows=100 loops=1)
       ->  Merge Left Join  (cost=1443640.26..2558613.62 rows=22085541 width=17) (actual time=34728.237..34729.186 rows=100 loops=1)
             Merge Cond: (t1.id = t2.t1_id)
             ->  Index Scan Backward using t1_pkey on t1  (cost=0.00..814767.66 rows=22085541 width=9) (actual time=0.045..0.180 rows=100 loops=1)
             ->  Materialize  (cost=1443640.26..1568639.98 rows=9999977 width=8) (actual time=34728.167..34728.653 rows=100 loops=1)
                   ->  Sort  (cost=1443640.26..1468640.21 rows=9999977 width=8) (actual time=34728.154..34728.287 rows=100 loops=1)
                         Sort Key: t2.t1_id
                         Sort Method:  external merge  Disk: 175872kB
                         ->  Seq Scan on t2  (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.058..12331.185 rows=10000000 loops=1)
     Total runtime: 34794.735 ms
    (10 rows)
34 seconds, not so good, let's add some index on the column t1\_id from the t2 table so that the index can be used in the join condition:
create index i_t2_t1_id on t2(t1_id);
Checking the index size:
mtest=>
  select pg_size_pretty(pg_total_relation_size('i_t2_t1_id'));

 pg_size_pretty
----------------
 214 MB
(1 row)
and the whole t2 table size:
mtest=>
  select pg_size_pretty(pg_total_relation_size('t2'));

 pg_size_pretty
----------------
 774 MB
(1 row)
Let's check the execution plan once again with the new index:
explain
select *
from t1
left join t2 on t1.id = t2.t1_id
order by t1.id desc
limit 100;


                                                QUERY PLAN
    ---------------------------------------------------------------------------------------------------
     Limit  (cost=0.00..883.30 rows=100 width=17)
       ->  Nested Loop Left Join  (cost=0.00..195082023.35 rows=22085541 width=17)
             ->  Index Scan Backward using t1_pkey on t1  (cost=0.00..814767.66 rows=22085541 width=9)
             ->  Index Scan using i_t2_t1_id on t2  (cost=0.00..8.78 rows=1 width=8)
                   Index Cond: (t1.id = t2.t1_id)
    (5 rows)
Looks quite nice, the execution plan has not sequential scans and is using the new index. Let's check how the real plan for this query:
explain analyze
  select *
  from t1
  left join t2 on t1.id = t2.t1_id
  order by t1.id desc
  limit 100;


                                                                      QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=0.00..883.30 rows=100 width=17) (actual time=0.177..2.111 rows=100 loops=1)
       ->  Nested Loop Left Join  (cost=0.00..195082023.35 rows=22085541 width=17) (actual time=0.173..1.863 rows=100 loops=1)
             ->  Index Scan Backward using t1_pkey on t1  (cost=0.00..814767.66 rows=22085541 width=9) (actual time=0.032..0.214 rows=100 loops=1)
             ->  Index Scan using i_t2_t1_id on t2  (cost=0.00..8.78 rows=1 width=8) (actual time=0.009..0.011 rows=1 loops=100)
                   Index Cond: (t1.id = t2.t1_id)
     Total runtime: 2.352 ms
    (6 rows)
Nice but this data could be cached, let's check some data not from the very beginning.
explain analyze
  select *
  from t1
  left join t2 on t1.id = t2.t1_id
  order by t1.id desc
  limit 100
  offset 100000;



                                                                         QUERY PLAN
    ----------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=885952.01..886837.96 rows=100 width=17) (actual time=1170.478..1171.741 rows=100 loops=1)
       ->  Nested Loop Left Join  (cost=0.00..88595200.54 rows=10000000 width=17) (actual time=0.060..1089.453 rows=100100 loops=1)
             ->  Index Scan Backward using t1_pkey on t1  (cost=0.00..633484.54 rows=10000000 width=9) (actual time=0.031..136.602 rows=100100 loops=1)
             ->  Index Scan using i_t2_t1_id on t2  (cost=0.00..8.78 rows=1 width=8) (actual time=0.004..0.005 rows=1 loops=100100)
                   Index Cond: (t1.id = t2.t1_id)
     Total runtime: 1171.944 ms
    (6 rows)
Over 1 second, that can be a lot of time but... this is only my notebook, it is not so fast, some other programs are running in the background, there is much less ram than on normal database servers, there is a slower hdd and much less memory available in the config for the database than on a normal server etc. On the production database this query can be much faster. ### In reality joins on tables with millions of records are not a big problem. Another, usually funny, argument is that the noSQL movement is stronger and stronger and more and more companies just take all the nonsql databases so that they can omit all that terrible joins. In fact joins are not that terrible, joins are quire usefull but you must have a good DBA (I think I know one :) ) that can improve some database internals or can create the query in a quite good way. Unfortunetally more and more programmers know that they can use the SQL database without knowing the SQL just because they use ORMs. Due to these ORM they usually turn off thinking, that's sad.
mtest=> select count(*) from t1;  
count  
----------  
10000000  
(1 row)

Time: 17340,848 ms  
mtest=> select count(*) from t2;  
count  
----------  
10000000  
(1 row)

Time: 3522,489 ms
The comments are disabled. If you want to write something to me, you can use e.g. Twitter.