# Problem with random() in PostgreSQL Subselect

Published at: 2011-08-04
Categories:
Tags:

In the simple project I work sometimes on, I needed to generate a lot of random data in a PostgreSQL database. I’ve got a couple of functions for that, but all of them are based on the simple random() function.

The random() returns value from the range [0, 1):

# select random();
random
-------------------
0.583591205533594
(1 row)

So let’s generate the random number ten times.

# SELECT random() FROM generate_series(1, 10);
random
-------------------
0.639965275302529
0.284068592358381
0.770143453031778
0.366915424354374
0.132617546245456
0.925832690205425
0.610237999819219
0.536387065425515
0.386185770388693
0.878687811084092
(10 rows)

The function generate_series(A, B) generates all integer numbers from A to B.

Everything looks good so far.

What about using the random() function many times for the same row?

# SELECT
random(),
random()
FROM generate_series(1, 10);

random       |       random
-------------------+---------------------
0.820922990795225 |   0.223581083118916
0.501614489126951 |   0.785008593462408
0.265057329554111 |   0.952001279685646
0.207879517227411 |   0.785138573497534
0.64019275130704  |   0.00667897937819362
0.368729779031128 |   0.977644741069525
0.948430337011814 |   0.536845378577709
0.620261657517403 |   0.683078715577722
0.114159550517797 |   0.869934193789959
0.81339857308194  |   0.236363218165934
(10 rows)

This looks good too.

Unfortunately I wanted something else. I wanted to use a couple of functions using `random()`` for one row, but some of them in subselects.

What about using the random() in a subselect?

# SELECT
random(),
(SELECT random())
FROM generate_series(1, 10);

random       |     ?column?
-------------------+------------------
0.550964544992894 | 0.45336384838447
0.520431810524315 | 0.45336384838447
0.321107998024672 | 0.45336384838447
0.820279272738844 | 0.45336384838447
0.65304935676977  | 0.45336384838447
0.246940688695759 | 0.45336384838447
0.430517273023725 | 0.45336384838447
0.189436422195286 | 0.45336384838447
0.633126459084451 | 0.45336384838447
0.309205084107816 | 0.45336384838447
(10 rows)

And this is the problem. Normally the random() function is called for each row, so I can get ten different random numbers. When used in the subquery, it is evaluated only once. That’s not what I needed. I tried many different things, including making subsubqueries or changing my function volatility classification.

It turned out that PostgreSQL can treat such a scalar subquery which doesn’t have any outer dependences as a stable one, so it can be evaluated only once. It is evaluated only once even if the function used in the subquery is marked as volatile.

The solution to this problem was creating some outer dependency such as:

# SELECT
random(),
(SELECT random() WHERE g = g)
FROM generate_series(1, 10) g;

random        |      ?column?
----------------------+---------------------
0.0103594129905105 |   0.856707542203367
0.810819573234767  |   0.795368006452918
0.121764871757478  |   0.762820852920413
0.0032475236803293 |   0.906903445255011
0.403013604227453  |   0.009926503524184
0.2756332247518    |   0.38065834576264
0.958356840535998  |   0.812478603329509
0.0009200032800436 |   0.64143555611372
0.926638153847307  |   0.870854197070003
0.45483412919566   |   0.163001372478902
(10 rows)

Now it works properly, although it is a little bit strange. Fortunately TDD is your real friend, and such errors can easily be catched. One more reason to test everything properly to avoid such surprises.

Similar Posts:
The comments are disabled. If you want to write something to me, you can use e.g. Twitter.