Problem with random() in PostgreSQL Subselect

Thu, Aug 4, 2011

by Szymon Lipiński

tags: database postgresql programming

In a simple project I work sometimes on, I needed to generate a lot of random data in a PostgreSQL database. I’ve got a couple of functions for that, but all of them are based on the simple random() function.

The random() function returns value from the range [0, 1):

    # select random();
          random
    -------------------
     0.583591205533594
    (1 row)

So let’s generate the random number ten times.

    # SELECT random() FROM generate_series(1, 10);
          random
    -------------------
     0.639965275302529
     0.284068592358381
     0.770143453031778
     0.366915424354374
     0.132617546245456
     0.925832690205425
     0.610237999819219
     0.536387065425515
     0.386185770388693
     0.878687811084092
    (10 rows)

The function generate_series(A, B) generates all integer numbers from A to B.

Everything looks good so far.

What about using the random() function many times for the same row?

    # SELECT
              random(),
              random() 
      FROM generate_series(1, 10);

          random       |       random
    -------------------+---------------------
     0.820922990795225 |   0.223581083118916
     0.501614489126951 |   0.785008593462408
     0.265057329554111 |   0.952001279685646
     0.207879517227411 |   0.785138573497534
     0.640192751307040 |   0.006678979378193
     0.368729779031128 |   0.977644741069525
     0.948430337011814 |   0.536845378577709
     0.620261657517403 |   0.683078715577722
     0.114159550517797 |   0.869934193789959
     0.813398573081940 |   0.236363218165934
    (10 rows)

This looks good too.

Unfortunately I wanted something else. I wanted to use a couple of functions using random() for one row, but some of them in subselects.

What about using the random() in a subselect?

    # SELECT 
              random(), 
              (SELECT random()) 
        FROM generate_series(1, 10);

          random       |     ?column?
    -------------------+------------------
     0.550964544992894 | 0.45336384838447
     0.520431810524315 | 0.45336384838447
     0.321107998024672 | 0.45336384838447
     0.820279272738844 | 0.45336384838447
     0.653049356769770 | 0.45336384838447
     0.246940688695759 | 0.45336384838447
     0.430517273023725 | 0.45336384838447
     0.189436422195286 | 0.45336384838447
     0.633126459084451 | 0.45336384838447
     0.309205084107816 | 0.45336384838447
    (10 rows)

And this is the problem. Normally the random() function is called for each row, so I can get ten different random numbers. When used in the subquery, it is evaluated only once. That’s not what I needed. I tried many different things, including making subsubqueries or changing my function volatility type.

It turned out that PostgreSQL can treat such a scalar subquery, which doesn’t have any outer dependencies as a stable one, so it can be evaluated only once. It is evaluated only once even if the function used in the subquery is marked as volatile.

The solution to this problem was creating some outer dependency such as:

    # SELECT 
              random(), 
              (SELECT random() WHERE g = g) 
        FROM generate_series(1, 10) g;

            random        |      ?column?
    ----------------------+---------------------
       0.0103594129905105 |   0.856707542203367
       0.8108195732347670 |   0.795368006452918
       0.1217648717574780 |   0.762820852920413
       0.0032475236803293 |   0.906903445255011
       0.4030136042274530 |   0.009926503524184
       0.2756332247518000 |   0.380658345762640
       0.9583568405359980 |   0.812478603329509
       0.0009200032800436 |   0.641435556113720
       0.9266381538473070 |   0.870854197070003
       0.4548341291956600 |   0.163001372478902
    (10 rows)

Now it works properly, although it is a little bit strange. Fortunately TDD is your real friend and such errors can easily be caught. One more reason to test everything properly to avoid such surprises.