How To Calculate Length of Overlapping Ranges in PostgreSQL

The Problem

PostgreSQL has many interesting features for building custom logic inside the database to ensure the data is correct. It is also possible to build logic for simpler processing data.

In this post I will show how to use PostgreSQL to create an easy to use custom aggregate to calculate the length of a collection of overlapping ranges.

Why Proper Types Matter and Duck Typing Not

Many Python programmers repeat that looking like a duck is much more important than if it’s really a duck. So, if you have an object passed to a function, then the class of the object is not important. The only important thing is if the passed object has a specific function, which you want to call.

If it walks like a duck and it quacks like a duck, then it must be a duck

https://en.wikipedia.org/wiki/Duck_typing

What if it’s not? In the Wikipedia there is a more historical term, a Duck Test:

If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

https://en.wikipedia.org/wiki/Duck_test

The word must is a very bad assumption. What if it’s not? Will the program blow up on production. The word probably is a better assumption, but it’s also a very significant source of problems in many programs I observed. Sometimes something looking like a duck is not a duck at all. It’s not even similar to a duck.

A string is not a number and a number is not a rectangle. So how come that in programming people claim these are the same things? Simple things will be simple. Unfortunately, every simple program tends to go in the direction of a much more twisted beast, right after you think about adding a new feature or two. Too often after such a change the result was having twisted ducks with four legs but without a head. The program was fine in most cases. Debugging was hard. It was much easier to start writing that from scratch. And then you will need to add a feature or two…

Should you care? Well, you always should. Unless you don’t care about the programs you are writing.

Why I Like Kotlin

For the last couple of months I’ve been slowly going from Python to Kotlin. Kotlin is a statically typed language running on JVM. During the last years I was rather avoiding all the JVM stuff. Now I’m slowly starting to appreciate this.

I always disliked Java for the huge memory consumption, garbage collector, the terrible inconsistency and quite stupid design decisions. Then I slowly switched to Python. It was great at the beginning. However, when going deeper and deeper, and the projects were getting bigger and bigger, it turned out that it’s just slowing down the whole development.

Three Value Logic in Sql

SQL has a little bit different logic system than normally is taught at schools. The simple two value logic has only the values: true and false. Additionally there are logical operators like or, and, not. There are also operators returning these values like =, <>, in.

All that combined allows for some complicated searches like:

SELECT * 
FROM tab
WHERE a = 'b' AND (x <> 'c' OR y in ('a', 'b'));

An example from the last bug I fixed:

SELECT *
FROM x
WHERE id = :id1 AND id <> :id2;

The values of :id1 and :id2 are passed by a function using a complicated logic. Generally I wanted to have results like this:

WITH DATA AS (SELECT 1 AS id)
SELECT id
FROM data
WHERE id = 1 AND id <> 2;

 id 
----
  1
(1 row)

For some values of the :id2 argument the query returned no rows at all, even that there was a row with id=1.

The two-valued logic in databases is generally useless. In SQL there is something more: there is NULL. This is also called a three-valued logic. The rules are a little bit different and sometimes it’s too easy to make mess.

The Boolean Values

In the two-valued logic we have only true and false. Then the results of some operations are:

An Error in A Query Removed All Data

There Was an Error in a Good Query

After applying a data migration we had some duplicated rows in a table. I wrote a simple query for finding the duplicates and removing them from the table. All went fine. Or at least I thought so.

This kind of operations should always be done in a transaction, so the changes can be rolled back on error. The general workflow I use looks like this:

BEGIN;

DELETE FROM x...

SELECT * FROM x...

...

This leaves an open transaction, so I can check all the data without modifying them permanently in the database.