Three Value Logic in Sql

published at 17 Aug, 2018 by Szymon LipiƄski tags: postgresq sql

SQL has a little bit different logic system than normally is taught at schools. The simple two value logic has only the values: true and false. Additionally there are logical operators like or, and, not. There are also operators returning these values like =, <>, in.

All that combined allows for some complicated searches like:

SELECT * 
FROM tab
WHERE a = 'b' AND (x <> 'c' OR y in ('a', 'b'));

An example from the last bug I fixed:

SELECT *
FROM x
WHERE id = :id1 AND id <> :id2;

The values of :id1 and :id2 are passed by a function using a complicated logic. Generally I wanted to have results like this:

WITH DATA AS (SELECT 1 AS id)
SELECT id
FROM data
WHERE id = 1 AND id <> 2;

 id 
----
  1
(1 row)

For some values of the :id2 argument the query returned no rows at all, even that there was a row with id=1.

The two-valued logic in databases is generally useless. In SQL there is something more: there is NULL. This is also called a three-valued logic. The rules are a little bit different and sometimes it’s too easy to make mess.

The Boolean Values

In the two-valued logic we have only true and false. Then the results of some operations are:

 a | b | not a | a and b | a or b | a = b | a <> b 
---+---+-------+---------+--------+-------+--------
 t | t | f     | t       | t      | t     | f
 t | f | f     | f       | t      | f     | t
 f | t | t     | f       | t      | f     | t
 f | f | t     | f       | f      | t     | f

As you can see, all that seems to be quite intuitive or at least it looks like it was taught at school.

In the three-valued logic there is also NULL. The NULL means that there is no data, there is no information. It’s like when you have a table with clients and you ask for the client’s age. When someone responses, then you can insert the value in the column. When you have no response, then you can use NULL to indicate that there is no data.

The logic operators behave now a little bit differently. It’s easy to say that age > 20 when you have some data. What should be the result of this operation when the age has no value, when there is NULL inside? If you have no value, the only sensible value of age > 20 is unknown, so this will return NULL as well.

The simple operations for the three-valued logic are:

   a    |   b    | not a  | a and b | a or b | a = b  | a <> b 
--------+--------+--------+---------+--------+--------+--------
 t      | t      | f      | t       | t      | t      | f
 t      | f      | f      | f       | t      | f      | t
 t      | [NULL] | f      | [NULL]  | t      | [NULL] | [NULL]
 f      | t      | t      | f       | t      | f      | t
 f      | f      | t      | f       | f      | t      | f
 f      | [NULL] | t      | f       | [NULL] | [NULL] | [NULL]
 [NULL] | t      | [NULL] | [NULL]  | t      | [NULL] | [NULL]
 [NULL] | f      | [NULL] | f       | [NULL] | [NULL] | [NULL]
 [NULL] | [NULL] | [NULL] | [NULL]  | [NULL] | [NULL] | [NULL]

As you can see, every operator returns NULL when there is NULL in one of the arguments. And this is one part of the bug in the query.

How Where Works

Let’s analyze how the where clause works. I will make a couple of queries, some will return a row, some not.

A Simple Query

This just returns one row.

WITH data AS (SELECT 1 AS id) 
SELECT id 
FROM data;

 id 
----
  1
(1 row)

A Query With Where

This one also returns one row, as there is a row where id=1.

WITH data AS (SELECT 1 AS id) 
SELECT id 
FROM data
WHERE id = 1;

 id 
----
  1
(1 row)

This will also work here:

WITH data AS (SELECT 1 AS id) 
SELECT id 
FROM data
WHERE id = 1 AND id <> 2;

 id 
----
  1
(1 row)

The Bug

The problem with the original query was that sometimes we were passing :id2 = NULL. The query then looked like this:

WITH data AS (SELECT 1 AS id) 
SELECT id 
FROM data
WHERE id = 1 AND id <> null;

 id 
----
(0 rows)

The Problem

The main problem with the WHERE clause is that a row will be in the query result only and only if the WHERE returns true for the row. In the three-valued logic world there is also false and null. For them the row will not be in the result set.

Let’s check the logic values:

SELECT 1 = 1 "1 = 1", 1 <> 2 "1 <> 2";
 1 = 1 | 1 <> 2 
-------+--------
 t     | t

So for non-null input we get true twice. Then we get

SELECT true AND true "true and true";
 true and true 
---------------
 t

So this query returns a row:

WITH data AS (SELECT 1 AS id) 
SELECT id 
FROM data
WHERE id = 1 AND id <> 2;

 id 
----
  1
(1 row)

For the :id2 = null we get:

SELECT 1 = 1 "1 = 1", 1 <> null "1 <> null";
 1 = 1 | 1 <> null 
-------+-----------
 t     | [NULL]

which means:

SELECT true AND null "true and null";
 true and null 
---------------
 [NULL]

The where must return true for a row to it have in the query result set. Here we have null that’s why this query returns no row:

WITH data AS (SELECT 1 AS id) 
SELECT id 
FROM data
WHERE id = 1 AND id <> null;

 id 
----
(0 rows)

The Fix

The fix was quite obvious, just use a different operator which returns only true and false even if the input is null. The operator is is distinct from:

   a    |   b    | a = b  | a <> b | a is distinct from b 
--------+--------+--------+--------+----------------------
 t      | t      | t      | f      | f
 t      | f      | f      | t      | t
 t      | [NULL] | [NULL] | [NULL] | t
 f      | t      | f      | t      | t
 f      | f      | t      | f      | f
 f      | [NULL] | [NULL] | [NULL] | t
 [NULL] | t      | [NULL] | [NULL] | t
 [NULL] | f      | [NULL] | [NULL] | t
 [NULL] | [NULL] | [NULL] | [NULL] | f
(9 rows)

So now the query looks like this:

WITH data AS (SELECT 1 AS id) 
SELECT id 
FROM data
WHERE id = 1 AND id is distinct from null;

 id 
----
  1

Summary

I made this quite a simple mistake when I was writing the original queries: I assumed that there will be passed only some valid identifiers as the query parameters. After some more development, the code became more complicated and the nulls silently slipped in. The problem was found only because we had lots of tests checking the returned data.

Two advices for the future queries:

  • Write good tests for all the functions getting data from a database.
  • Use is distinct from instead of <> every time you deal with the query parameters that you get from the outside world.