The Worst Database Design

Tue, Aug 25, 2009

by Szymon Lipiński

tags: database

In this blog post I will describe the worst database design I have ever seen in a production database.

Let’s create tables like these:

CREATE TABLE A
(
    ID PRIMARY KEY
    ...
);

CREATE TABLE B
(
    ID PRIMARY KEY
    ...
);

The whole idea is that (due to some reasons that I don’t remember) there is a one-to-one relation between those two tables. It means that A.id = B.id.

The one-to-one relation is not quite normal and generally should be avoided. Avoided means only: “don’t do that unless you know what you’re doing”.

The first table looked like this:

CREATE TABLE A
(
    ID SERIAL PRIMARY KEY
)

So all values for the id column are taken from a sequence. That’s quite normal. The obvious solution was to insert into the second table the id taken from the first table’s sequence. Making all that in one transaction would look like this:

BEGIN;
    INSERT INTO A(...) VALUES (...);
    INSERT INTO B(id, ...) VALUES (1, ...);
COMMIT;

If the rollback occurs, then there will be a gap in the numeration. It is not such a big problem. That’s quite normal with sequences, as they are not transactional. Not transactional means that after a rollback the number taken from a sequence won’t go back. So the sequence is only incremented.

The problem with the proposed and normally used solution was that the table looked like this:

CREATE TABLE A
(
    ID SERIAL PRIMARY KEY
    ...
);

CREATE TABLE B
(
    ID SERIAL PRIMARY KEY
    ...
);

So here we’ve got two sequences. Two quite different not transactional sequences. Normally application makes queries like:

BEGIN;
    INSERT INTO A(...) VALUES (...);
    INSERT INTO B(...) VALUES (...);
COMMIT;

So A.id is taken from the A-sequence and B.id is taken from the B-sequence. What if the second insert fails? Then the A-sequence has been incremented and the B-sequence has not.

Let’s look at the sequences state after some inserts:

A-sequence	B-sequence
1	1
2	2
3	3
ERROR	-
4	3
ERROR	-
5	4

So after some time we can see that there is one inconsitency in the whole table: the row where A.id=4, all others are OK. This error can be found after years. During that time there can be more and more such mistakes.

The effect is that the data becomes more and more inconsistent, and suddenly the whole database becomes full of garbage instead of data. Usually there is no way to repair the data after months of such mistakes.

If you don’t want to desing a proper database from the very beginning, then good luck with cleaning the garbage. I’m sure the management will be happy when the clients will get wrong invoices due to wrong data.