# Problem with random() in PostgreSQL Subselect

Published at: 2011-08-04
Categories:
Tags:

In the simple project I work sometimes on, I needed to generate a lot of random data in a PostgreSQL database. I’ve got a couple of functions for that, but all of them are based on the simple `random()` function.

The `random()` returns value from the range `[0, 1)`:

``````    # select random();
random
-------------------
0.583591205533594
(1 row)
``````

So let’s generate the random number ten times.

``````    # SELECT random() FROM generate_series(1, 10);
random
-------------------
0.639965275302529
0.284068592358381
0.770143453031778
0.366915424354374
0.132617546245456
0.925832690205425
0.610237999819219
0.536387065425515
0.386185770388693
0.878687811084092
(10 rows)
``````

The function `generate_series(A, B)` generates all integer numbers from `A` to `B`.

Everything looks good so far.

What about using the `random()` function many times for the same row?

``````    # SELECT
random(),
random()
FROM generate_series(1, 10);

random       |       random
-------------------+---------------------
0.820922990795225 |   0.223581083118916
0.501614489126951 |   0.785008593462408
0.265057329554111 |   0.952001279685646
0.207879517227411 |   0.785138573497534
0.64019275130704  |   0.00667897937819362
0.368729779031128 |   0.977644741069525
0.948430337011814 |   0.536845378577709
0.620261657517403 |   0.683078715577722
0.114159550517797 |   0.869934193789959
0.81339857308194  |   0.236363218165934
(10 rows)
``````

This looks good too.

Unfortunately I wanted something else. I wanted to use a couple of functions using `random()`` for one row, but some of them in subselects.

What about using the `random()` in a subselect?

``````    # SELECT
random(),
(SELECT random())
FROM generate_series(1, 10);

random       |     ?column?
-------------------+------------------
0.550964544992894 | 0.45336384838447
0.520431810524315 | 0.45336384838447
0.321107998024672 | 0.45336384838447
0.820279272738844 | 0.45336384838447
0.65304935676977  | 0.45336384838447
0.246940688695759 | 0.45336384838447
0.430517273023725 | 0.45336384838447
0.189436422195286 | 0.45336384838447
0.633126459084451 | 0.45336384838447
0.309205084107816 | 0.45336384838447
(10 rows)
``````

And this is the problem. Normally the `random()` function is called for each row, so I can get ten different random numbers. When used in the subquery, it is evaluated only once. That’s not what I needed. I tried many different things, including making subsubqueries or changing my function volatility classification.

It turned out that PostgreSQL can treat such a scalar subquery which doesn’t have any outer dependences as a stable one, so it can be evaluated only once. It is evaluated only once even if the function used in the subquery is marked as volatile.

The solution to this problem was creating some outer dependency such as:

``````    # SELECT
random(),
(SELECT random() WHERE g = g)
FROM generate_series(1, 10) g;

random        |      ?column?
----------------------+---------------------
0.0103594129905105 |   0.856707542203367
0.810819573234767  |   0.795368006452918
0.121764871757478  |   0.762820852920413
0.0032475236803293 |   0.906903445255011
0.403013604227453  |   0.009926503524184
0.2756332247518    |   0.38065834576264
0.958356840535998  |   0.812478603329509
0.0009200032800436 |   0.64143555611372
0.926638153847307  |   0.870854197070003
0.45483412919566   |   0.163001372478902
(10 rows)
``````

Now it works properly, although it is a little bit strange. Fortunately TDD is your real friend, and such errors can easily be catched. One more reason to test everything properly to avoid such surprises.

The comments are disabled. If you want to write something to me, you can use e.g. Twitter.