Problem with random() in PostgreSQL Subselect

by Szymon LipiĆski

In a simple project I work sometimes on, I needed to generate a lot of random data in a PostgreSQL database. I’ve got a couple of functions for that, but all of them are based on the simple `random()` function.

The `random()` function returns value from the range `[0, 1)`:

``````    # select random();
random
-------------------
0.583591205533594
(1 row)
``````

So let’s generate the random number ten times.

``````    # SELECT random() FROM generate_series(1, 10);
random
-------------------
0.639965275302529
0.284068592358381
0.770143453031778
0.366915424354374
0.132617546245456
0.925832690205425
0.610237999819219
0.536387065425515
0.386185770388693
0.878687811084092
(10 rows)
``````

The function `generate_series(A, B)` generates all integer numbers from `A` to `B`.

Everything looks good so far.

What about using the `random()` function many times for the same row?

``````    # SELECT
random(),
random()
FROM generate_series(1, 10);

random       |       random
-------------------+---------------------
0.820922990795225 |   0.223581083118916
0.501614489126951 |   0.785008593462408
0.265057329554111 |   0.952001279685646
0.207879517227411 |   0.785138573497534
0.640192751307040 |   0.006678979378193
0.368729779031128 |   0.977644741069525
0.948430337011814 |   0.536845378577709
0.620261657517403 |   0.683078715577722
0.114159550517797 |   0.869934193789959
0.813398573081940 |   0.236363218165934
(10 rows)
``````

This looks good too.

Unfortunately I wanted something else. I wanted to use a couple of functions using `random()` for one row, but some of them in subselects.

What about using the `random()` in a subselect?

``````    # SELECT
random(),
(SELECT random())
FROM generate_series(1, 10);

random       |     ?column?
-------------------+------------------
0.550964544992894 | 0.45336384838447
0.520431810524315 | 0.45336384838447
0.321107998024672 | 0.45336384838447
0.820279272738844 | 0.45336384838447
0.653049356769770 | 0.45336384838447
0.246940688695759 | 0.45336384838447
0.430517273023725 | 0.45336384838447
0.189436422195286 | 0.45336384838447
0.633126459084451 | 0.45336384838447
0.309205084107816 | 0.45336384838447
(10 rows)
``````

And this is the problem. Normally the `random()` function is called for each row, so I can get ten different random numbers. When used in the subquery, it is evaluated only once. That’s not what I needed. I tried many different things, including making subsubqueries or changing my function volatility type.

It turned out that PostgreSQL can treat such a scalar subquery, which doesn’t have any outer dependencies as a stable one, so it can be evaluated only once. It is evaluated only once even if the function used in the subquery is marked as volatile.

The solution to this problem was creating some outer dependency such as:

``````    # SELECT
random(),
(SELECT random() WHERE g = g)
FROM generate_series(1, 10) g;

random        |      ?column?
----------------------+---------------------
0.0103594129905105 |   0.856707542203367
0.8108195732347670 |   0.795368006452918
0.1217648717574780 |   0.762820852920413
0.0032475236803293 |   0.906903445255011
0.4030136042274530 |   0.009926503524184
0.2756332247518000 |   0.380658345762640
0.9583568405359980 |   0.812478603329509
0.0009200032800436 |   0.641435556113720
0.9266381538473070 |   0.870854197070003
0.4548341291956600 |   0.163001372478902
(10 rows)
``````

Now it works properly, although it is a little bit strange. Fortunately TDD is your real friend and such errors can easily be caught. One more reason to test everything properly to avoid such surprises.