Are SQL Database Joins So Bad?

Thu, Nov 12, 2009

by Szymon Lipiński

tags: database postgresql

Last time on many forums there were endless discussions about database joins. All about how they are bad, very inefficient, and terrible and the best is to avoid them. Generally that’s not true and I’ll try to show this here (taking some small 10 million records tables as on one forum was suggested).

First let’s create a simple table:

create table t1 (
    id serial primary key,
    i integer not null,
    t text not null
);

now let’s fill this having 10 million rows:

10 times run this:

insert into t1 (i,t)
select generate_series(1,1000000), '';

And now (just to have this table bigger as some people say that joins on small tables are fast, while on big tables are bad):

update t1 set t = i::text;

Let’s add another table:

create table t2 (
    id serial primary key,
    t1_id integer not null references t1
);

Fill it with the data:

insert into t2(t1_id) select id from t1;

Now the tables look like this:

mtest=> \d t1
                         Table "public.t1"
 Column |  Type   |                    Modifiers
--------+---------+-------------------------------------------------
 id     | integer | not null
                    default nextval('t1_id_seq'::regclass)
 i      | integer | not null
 t      | text    | not null
Indexes:
    "t1_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "t2" CONSTRAINT "t2_t1_id_fkey" FOREIGN KEY (t1_id) REFERENCES t1(id)

mtest=> \d t2
                         Table "public.t2"
 Column |  Type   |                    Modifiers
--------+---------+-------------------------------------------------
 id     | integer | not null
                    default nextval('t2_id_seq'::regclass)
 t1_id  | integer | not null
Indexes:
    "t2_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "t2_t1_id_fkey" FOREIGN KEY (t1_id) REFERENCES t1(id)

How many rows are there?

mtest=> select count(*) from t1;

  count
----------
 10000000
(1 row)

Time: 17340,848 ms

mtest=> select count(*) from t2;

  count
----------
 10000000
(1 row)

Time: 3522,489 ms

Let’s check the table size:

mtest=>
  select pg_size_pretty(pg_relation_size('t1'));

 pg_size_pretty
----------------
 1204 MB
(1 row)

mtest=>
  select pg_size_pretty(pg_relation_size('t2'));

 pg_size_pretty
----------------
 346 MB
(1 row)

But tables with indices are of course a little bit bigger:

mtest=>
  select pg_size_pretty(pg_total_relation_size('t1'));

pg_size_pretty
----------------
1847 MB
(1 row)

mtest=>
  select pg_size_pretty(pg_total_relation_size('t2'));

pg_size_pretty
----------------
560 MB
(1 row)

Let’s check the execution plan, both tables have 10 million of records:

explain
  select *
  from t1
  left join t2 on t1.id = t2.t1_id
  order by t1.id desc
  limit 100;


                                                QUERY PLAN
    ---------------------------------------------------------------------------------------------------
     Limit  (cost=1443640.26..1443645.31 rows=100 width=17)
       ->  Merge Left Join  (cost=1443640.26..2558613.62 rows=22085541 width=17)
             Merge Cond: (t1.id = t2.t1_id)
             ->  Index Scan Backward using t1_pkey on t1  (cost=0.00..814767.66 rows=22085541 width=9)
             ->  Materialize  (cost=1443640.26..1568639.98 rows=9999977 width=8)
                   ->  Sort  (cost=1443640.26..1468640.21 rows=9999977 width=8)
                         Sort Key: t2.t1_id
                         ->  Seq Scan on t2  (cost=0.00..144247.77 rows=9999977 width=8)
    (8 rows)

As you can see there is that terrible sequential scan on the t2 table. That’s not good. Let’s see how long it takes to run this query:

explain analyze
  select *
  from t1
  left join t2 on t1.id = t2.t1_id
  order by t1.id desc
  limit 100;


                                                                      QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=1443640.26..1443645.31 rows=100 width=17) (actual time=34728.242..34729.371 rows=100 loops=1)
       ->  Merge Left Join  (cost=1443640.26..2558613.62 rows=22085541 width=17) (actual time=34728.237..34729.186 rows=100 loops=1)
             Merge Cond: (t1.id = t2.t1_id)
             ->  Index Scan Backward using t1_pkey on t1  (cost=0.00..814767.66 rows=22085541 width=9) (actual time=0.045..0.180 rows=100 loops=1)
             ->  Materialize  (cost=1443640.26..1568639.98 rows=9999977 width=8) (actual time=34728.167..34728.653 rows=100 loops=1)
                   ->  Sort  (cost=1443640.26..1468640.21 rows=9999977 width=8) (actual time=34728.154..34728.287 rows=100 loops=1)
                         Sort Key: t2.t1_id
                         Sort Method:  external merge  Disk: 175872kB
                         ->  Seq Scan on t2  (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.058..12331.185 rows=10000000 loops=1)
     Total runtime: 34794.735 ms
    (10 rows)

34 seconds, not so good. Let’s add some index on the column t1_id from the t2 table so that the index can be used in the join condition:

create index i_t2_t1_id on t2(t1_id);

Checking the index size:

mtest=>
  select pg_size_pretty(pg_total_relation_size('i_t2_t1_id'));

 pg_size_pretty
----------------
 214 MB
(1 row)

and the whole t2 table size:

mtest=>
  select pg_size_pretty(pg_total_relation_size('t2'));

 pg_size_pretty
----------------
 774 MB
(1 row)

Let’s check the execution plan once again with the new index:

explain
select *
from t1
left join t2 on t1.id = t2.t1_id
order by t1.id desc
limit 100;


                                                QUERY PLAN
    ---------------------------------------------------------------------------------------------------
     Limit  (cost=0.00..883.30 rows=100 width=17)
       ->  Nested Loop Left Join  (cost=0.00..195082023.35 rows=22085541 width=17)
             ->  Index Scan Backward using t1_pkey on t1  (cost=0.00..814767.66 rows=22085541 width=9)
             ->  Index Scan using i_t2_t1_id on t2  (cost=0.00..8.78 rows=1 width=8)
                   Index Cond: (t1.id = t2.t1_id)
    (5 rows)

Looks quite nice, the execution plan has not sequential scans and is using the new index. Let’s check how the real plan for this query:

explain analyze
  select *
  from t1
  left join t2 on t1.id = t2.t1_id
  order by t1.id desc
  limit 100;


                                                                      QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=0.00..883.30 rows=100 width=17) (actual time=0.177..2.111 rows=100 loops=1)
       ->  Nested Loop Left Join  (cost=0.00..195082023.35 rows=22085541 width=17) (actual time=0.173..1.863 rows=100 loops=1)
             ->  Index Scan Backward using t1_pkey on t1  (cost=0.00..814767.66 rows=22085541 width=9) (actual time=0.032..0.214 rows=100 loops=1)
             ->  Index Scan using i_t2_t1_id on t2  (cost=0.00..8.78 rows=1 width=8) (actual time=0.009..0.011 rows=1 loops=100)
                   Index Cond: (t1.id = t2.t1_id)
     Total runtime: 2.352 ms
    (6 rows)

Nice but this data could be cached, let’s check some data not from the very beginning.

explain analyze
  select *
  from t1
  left join t2 on t1.id = t2.t1_id
  order by t1.id desc
  limit 100
  offset 100000;



                                                                         QUERY PLAN
    ----------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=885952.01..886837.96 rows=100 width=17) (actual time=1170.478..1171.741 rows=100 loops=1)
       ->  Nested Loop Left Join  (cost=0.00..88595200.54 rows=10000000 width=17) (actual time=0.060..1089.453 rows=100100 loops=1)
             ->  Index Scan Backward using t1_pkey on t1  (cost=0.00..633484.54 rows=10000000 width=9) (actual time=0.031..136.602 rows=100100 loops=1)
             ->  Index Scan using i_t2_t1_id on t2  (cost=0.00..8.78 rows=1 width=8) (actual time=0.004..0.005 rows=1 loops=100100)
                   Index Cond: (t1.id = t2.t1_id)
     Total runtime: 1171.944 ms
    (6 rows)

Over 1 second, that can be a lot of time but… this is only my quite slow notebook. It is not so fast, some other programs are running in the background, there is much less ram than on normal database servers, there is a slower disk and much less memory available in the config for the database than on a normal server etc. On the production database this query can be much faster.

In reality joins on tables with millions of records are not a big problem.

Another funny argument mentioned by the noSQL proponents is that the noSQL database are better because they omit all joins.

The truth is that the joins are not so terrible. They are quite useful. However to make them fast enough, you need to get a good DBA to tweak the database. Programmers can also do that, however you need to get ones who really understand databases.