An Error in A Query Removed All Data
There Was an Error in a Good Query
After applying a data migration we had some duplicated rows in a table. I wrote a simple query for finding the duplicates and removing them from the table. All went fine. Or at least I thought so.
This kind of operations should always be done in a transaction, so the changes can be rolled back on error. The general workflow I use looks like this:
BEGIN;
DELETE FROM x...
SELECT * FROM x...
...
This leaves an open transaction, so I can check all the data without modifying them permanently in the database.
An Example of The Problem
I cannot show you the production code. This is an artificial example, but the problem is the same.
The Data Structure
There is just one table. The column some_id
is a foreign key. For this example code I don’t want to add a new table, so it’s just an integer here.
CREATE TABLE test(
id SERIAL PRIMARY KEY,
some_id INTEGER NOT NULL,
word TEXT NOT NULL
);
And the table looks like this:
Table "public.test"
Column | Type | Nullable | Default
---------+---------+----------+----------------------------------
id | integer | not null | nextval('test_id_seq'::regclass)
some_id | integer | not null |
word | text | not null |
Indexes:
"test_pkey" PRIMARY KEY, btree (id)
Let’s insert some data:
INSERT INTO test(some_id, word)
VALUES
(1, 'aaa'),
(1, 'bbb'),
(1, 'ccc'),
(1, 'ddd'),
(1, 'aaa'),
(2, 'bbb'),
(2, 'bbb'),
(1, 'ccc'),
(2, 'ccc'),
(3, 'ccc')
;
And the data is:
SELECT * FROM test ORDER BY some_id, word;
id | some_id | word
----+---------+------
5 | 1 | aaa
1 | 1 | aaa
2 | 1 | bbb
8 | 1 | ccc
3 | 1 | ccc
4 | 1 | ddd
6 | 2 | bbb
7 | 2 | bbb
9 | 2 | ccc
10 | 3 | ccc
(10 rows)
The duplicates can be hard to see here, so let’s find them:
SELECT some_id, array_agg(word ORDER BY word)
FROM test
GROUP BY some_id
ORDER BY some_id;
some_id | array_agg
---------+---------------------------
1 | {aaa,aaa,bbb,ccc,ccc,ddd}
2 | {bbb,bbb,ccc}
3 | {ccc}
(3 rows)
The Constraint
According to the business logic, there can be only one row with a given word
for each some_id
. To ensure that the data is consistent, we need a database constraint:
ALTER TABLE test
ADD CONSTRAINT unique_some_id_word
UNIQUE(some_id, word);
However trying to create this constraint on the table test
with the current data ends with:
=> ALTER TABLE test ADD CONSTRAINT unique_some_id_word UNIQUE(some_id, word);
ERROR: could not create unique index "unique_some_id_word"
DETAIL: Key (some_id, word)=(2, bbb) is duplicated.
Find The Duplicates
The simplest way would be to use the rank
windowing function.
SELECT
id,
RANK() OVER (PARTITION BY some_id, word ORDER BY id) r
FROM test;
The above query divides the whole table data into small windows. Each of them will have the same values of the pair (some_id, word)
. The rows inside each window are sorted by the id
(which doesn’t really matter, it’s just to have consistent results). Then each row in the window gets a unique increasing number, which is stored in the r
column.
id | r
----+---
1 | 1
5 | 2
2 | 1
3 | 1
8 | 2
4 | 1
6 | 1
7 | 2
9 | 1
10 | 1
(10 rows)
If any of the rows has r
value bigger than 1
then it’s a duplicate.
SELECT id
FROM (
SELECT
id,
RANK() OVER (PARTITION BY some_id, word ORDER BY id) r
FROM test
) x
WHERE x.r > 1;
The rows we need to delete are:
id
----
5
8
7
(3 rows)
Removing The Duplicates
In the previous query we got all the ids to remove. I thought that deleting the rows would be easy:
DELETE FROM test
WHERE id IN (
SELECT id
FROM (
SELECT
id,
RANK() OVER (PARTITION BY some_id, word ORDER BY id) r
FROM test
) x
WHERE x.r > 1
);
DELETE 3
And after that I got:
SELECT some_id, array_agg(word ORDER BY word)
FROM test
GROUP BY some_id
ORDER BY some_id;
some_id | array_agg
---------+-------------------
1 | {aaa,bbb,ccc,ddd}
2 | {bbb,ccc}
3 | {ccc}
(3 rows)
So it’s what I wanted.
The Terrible Mistake
In the original query I made a mistake. I used word
instead of id
.
DELETE FROM test
WHERE id IN (
SELECT id
FROM (
SELECT
word,
RANK() OVER (PARTITION BY some_id, word ORDER BY id) r
FROM test
) x
WHERE x.r > 1
);
And it was a little bit devastating:
DELETE 10
=> SELECT * FROM test;
id | some_id | word
----+---------+------
(0 rows)
Yey, so I just REMOVED ALL THE DATA.
The Problem
This looks weird, but let’s go from the beginning. The internal query is fine and returns:
SELECT
word,
RANK() OVER (PARTITION BY some_id, word ORDER BY id) r
FROM test;
word | r
------+---
aaa | 1
aaa | 2
bbb | 1
ccc | 1
ccc | 2
ddd | 1
bbb | 1
bbb | 2
ccc | 1
ccc | 1
(10 rows)
However the next query has a huge error inside:
SELECT id
FROM (
SELECT
word,
RANK() OVER (PARTITION BY some_id, word ORDER BY id) r
FROM test
) x
WHERE x.r > 1
);
ERROR: column "id" does not exist
LINE 1: SELECT id
^
How does the external query work? Well, it’s possible to use the external table columns in a subquery. This way a query like this works:
UPDATE x SET a = (SELECT y.a FROM y WHERE y.some_id = a.id);
This is what happened here:
DELETE FROM test
WHERE id IN ( -- <= this is test.id
SELECT id -- <= this is also test.id
FROM (
SELECT
word,
RANK() OVER (PARTITION BY some_id, word ORDER BY id) r
FROM test
) x
WHERE x.r > 1
);
Let’s just list all the ids we have in the test
table:
SELECT id
FROM test
ORDER BY id;
id
----
1
2
3
4
5
6
7
8
9
10
(10 rows)
Let’s also check what we really get when trying to delete the rows:
SELECT id
FROM test
WHERE id IN ( -- <= this is test.id
SELECT id -- <= this is also test.id
FROM (
SELECT
word,
RANK() OVER (PARTITION BY some_id, word ORDER BY id) r
FROM test
) x
WHERE x.r > 1
);
id
----
1
2
3
4
5
6
7
8
9
10
(10 rows)
In both queries we get the same thing.
The Summary
I don’t think it’s a bug. It’s a valid query, which has tremendously terrific side effect. It removes all the data, just because of an error in a subquery, which then was silently replaced with something else.
Another reason for testing code before running that on production.