# Common Problem with random(min, max)

Published at: 2011-04-28
Categories:
Tags:

There is a common problem with many homemade random functions. All languages which I know have some kind of
`random()` function. This function returns a floating point number within the range `[0.0, 1.0)` with uniform distribution of values.

The random generators can have different distribution of the values, but in this entry I will just write about the uniformly distributed generator.

A uniform distribution means that each number can be returned with the same probability.

In `PostgreSQL` there is just the `random()` function. There is no function like `random(min, max)` which would return an integer value from the range `[min, max]`. Once upon a time I had to create such a function for some funny algorithm. The obvoius way used on many blogs to create that is something like:

``````min + (min - max) * random()
``````

… and it will return a number from the range `[min, max)`… sounds good.

There are just two errors:

• the range is `[min; max)`, but I wanted `[min; max]`
• the distribution is not uniform

Let’s check what the real problem is. A simple `SQL` function which you can implement in `PostgreSQL`, that returns such a random number is:

``````CREATE OR REPLACE FUNCTION
RETURNS INTEGER AS
\$\$
SELECT (\$1 + (\$2 - \$1) * random())::INTEGER;
\$\$ LANGUAGE SQL;
``````

I will also create a table for the data:

``````CREATE TABLE data_bad ( value INTEGER );
``````

and I will fill that with a sample:

``````INSERT INTO data_bad(value)
FROM generate_series(1,10*1000*1000);
``````

So my range is `[1; 10]` and there are 10M of numbers;

Let’s check the distribution then.

``````SELECT value "VALUE", count(*) "COUNT"
GROUP BY value
ORDER BY value;

VALUE   COUNT
------- -----------
1       555,979
2       1,111,342
3       1,111,332
4       1,110,525
5       1,112,093
6       1,110,441
7       1,112,215
8       1,109,548
9       1,111,165
10      555,360
``````

## What is the problem?

The main problem is the convertion to `INTEGER`. It works like this:

``````select
(generate_series(0, 10) / 10.0)::numeric(10,1)
"VALUE",
(generate_series(0, 10) / 10.0)::integer
"ROUNDED TO INTEGER";

VALUE   ROUNDED TO INTEGER
------- --------------------
0.0     0
0.1     0
0.2     0
0.3     0
0.4     0
0.5     1
0.6     1
0.7     1
0.8     1
0.9     1
1.0     1
``````

The sql function returns data from the range `[1, 10]`, but the distribution is not uniform. The value `1` is made from all numbers from the range `[1.0, 1.5)`, while the number `2` is from the range `[1.5, 2.5)`… the number `10` is from the range `[9.5, 10.0)`. The result is that we should get the values `1` and `10` less often than the rest.

## The Solution

Solution is quite simple… let’s just modify the formula to something like this:

``````floor(min + (max - min + 1) * random)
``````

se we take a random floating point value from the range `[1, 11)`, and then we get the floor of that. This way the output value of `1` is taken from the range `[1, 2)` and so on. All of the ranges have equal length, so the distribution should be uniform (it depends on the uniform distribution of the `random()` function of course).

The rounding from floating point to integer for this formula looks like this:

``````select
(generate_series(0, 10) / 10.0)::numeric(10,1)
"VALUE",
(generate_series(0, 10) / 10.0)::integer
"ROUNDED TO INTEGER",
floor(generate_series(0, 10) / 10.0)
"FLOORED";

VALUE   ROUNDED TO INTEGER   FLOORED
------- -------------------- ---------
0.0     0                    0
0.1     0                    0
0.2     0                    0
0.3     0                    0
0.4     0                    0
0.5     1                    0
0.6     1                    0
0.7     1                    0
0.8     1                    0
0.9     1                    0
1.0     1                    0
``````

Let’s check the generator:

``````CREATE TABLE data_good ( value INTEGER );
CREATE OR REPLACE FUNCTION
random_range_good(INTEGER, INTEGER) RETURNS INTEGER
AS \$\$
SELECT floor((\$1 + (\$2 - \$1 + 1) * random()))::INTEGER;
\$\$ LANGUAGE SQL;
INSERT INTO data_good(value)
SELECT random_range_good(1,10)
FROM generate_series(1,10*1000*1000);
SELECT
value "VALUE", count(*) "COUNT"
FROM data_good
GROUP BY value
ORDER BY value;

VALUE      COUNT
---------- ---------------
1          1000532
2          998310
3          999804
4          1000242
5          999987
6          998818
7          1000644
8          999577
9          1000642
10         1001444
``````

## The Chart

Let’s just compare the distributions:

It looks like the distribution is OK now.

## The Conclusion

• My function works.
• I really have no idea why people usually don’t test those “simple and easy” functions, even the random ones.
Similar Posts:
The comments are disabled. If you want to write something to me, you can use e.g. Twitter.