# Random String in PostgreSQL

published at 30 May, 2011 by Szymon Lipiński

Last time I wrote about common problem with the random function. PostgreSQL has only the `random()`. There is no function like `random(from, to)` or `random(length)`, which returns a random text of the given length.

This time I just needed a function which returns a random text of the given length. Then it can be used like this: `SELECT random_text(42)`. The random text will be used as a slug in a web application, so I need only numbers and letters and let’s also have only uppercase.

Let’s create a `PL/pgSQL` procedure for returning a random text with the given length.

## A Naive Solution

There is a very common solution to that problem: take an array of characters and choose randomly from them. Something like this:

``````CREATE OR REPLACE FUNCTION random_text_simple(length INTEGER)
RETURNS TEXT
LANGUAGE PLPGSQL
AS \$\$
DECLARE
possible_chars TEXT := '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
output TEXT := '';
i INT4;
pos INT4;
BEGIN

FOR i IN 1..length LOOP
pos := random_range(1, length(possible_chars));
output := output || substr(possible_chars, pos, 1);
END LOOP;

RETURN output;
END;
\$\$;``````

Looks great. The only thing missing there is the function `random_range`.

The function I use looks like this:

``````CREATE OR REPLACE FUNCTION random_range(INTEGER, INTEGER)
RETURNS INTEGER
LANGUAGE SQL
AS \$\$
SELECT (\$1 + FLOOR((\$2 - \$1 + 1) * random() ))::INTEGER;
\$\$;``````

The idea behind this function is described in one of my older posts: common problem with random min max.

Let’s check the efficiency of the function. How much time does it take to run it 1k times with different parameters:

``````  param                              time [s]
---------------------------------- -------------------------------------
1                                0.013
2                                0.013
10                                0.052
50                                0.115
100                                0.223
1,000                                2.224
10,000                               39.885
30,000                              217.746``````

One question. Does that look OK? Maybe not OK? If not OK, then why is that bad? Times are bad… but compared to what?

## A One Step Better Solution

How to improve the function? My first idea was to get rid of the variable `pos` and inline it into the next line. After such a change the function is:

``````CREATE OR REPLACE FUNCTION random_text_simple_1(length INTEGER)
RETURNS TEXT
LANGUAGE PLPGSQL
AS \$\$
DECLARE
possible_chars TEXT := '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
output TEXT := '';
i INT4;
BEGIN

FOR i IN 1..length LOOP
output := output || substr(possible_chars, random_range(1, length(possible_chars)), 1);
END LOOP;

RETURN output;
END;
\$\$;``````

``````  param                time [s]               previous [s]
-------------------- ---------------------- ----------------------------
1                0.010                  0.013
2                0.011                  0.013
10                0.024                  0.052
50                0.087                  0.115
100                0.168                  0.223
1,000                1.677                  2.224
10,000               34.677                 39.885
30,000              199.434                217.746``````

Looks a little bit better. Not bad, but could we do some more about that?

## A One Step Better Solution

My second try was to calculate the length of the string before the loop. Maybe `PL/pgSQL` can optimize that, maybe not, I will check.

Here is the function and timings:

``````CREATE OR REPLACE FUNCTION random_text_simple_2(length INTEGER) RETURNS TEXT LANGUAGE PLPGSQL
AS \$\$
DECLARE
possible_chars TEXT := '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
output TEXT := '';
i INT4;
chars_size INTEGER := length(possible_chars);
BEGIN
FOR i IN 1..length LOOP
output := output || substr(possible_chars,
random_range(1, chars_size), 1);
END LOOP;
RETURN output;
END; \$\$;

param                time [s]               previous [s]
-------------------- ---------------------- ----------------------------
1                 0.011                  0.010
2                 0.012                  0.011
10                 0.023                  0.024
50                 0.081                  0.087
100                 0.153                  0.168
1,000                 1.496                  1.677
10,000                32.353                 34.677
30,000               193.639                199.434``````

This doesn’t seem to be a big gain compared to the second one.

## A One Step Better Solution

Another idea was to inline the `random_range` function.

``````CREATE OR REPLACE FUNCTION random_text_simple_3(length INTEGER)
RETURNS TEXT
LANGUAGE PLPGSQL
AS \$\$
DECLARE
possible_chars TEXT := '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
output TEXT := '';
i INT4;
chars_size INTEGER;
BEGIN
chars_size := length(possible_chars);
FOR i IN 1..length LOOP
output := output || substr(
possible_chars,
(1 + FLOOR((chars_size - \$1 + 1) * random() ))::INTEGER, 1);
END LOOP;
RETURN output;
END; \$\$;

param                time [s]               previous [s]
-------------------- ---------------------- ----------------------------
1                0.011                   0.011
2                0.012                   0.012
10                0.023                   0.023
50                0.069                   0.081
100                0.127                   0.153
1,000                1.172                   1.496
10,000               11.672                  32.353
30,000               35.028                 193.639``````

Wow. I’ve never thought that inlining such a simple function would give me so huge difference.

## The Best Solution… So Far

While I was thinking about the cons of `PL/pgSQL`, I thought about writing the code in C and just use a C library.

Then I thought about using something strange like `md5(random()::TEXT)`. The whole idea behind this solution is to use two functions written in C. There will be some `PL/pgSQL` glue code, but the whole logic would be written in C.

The algorithm is:

• Calculate the number of MD5 sums to get at least `length` chars.
• Call `md5(random())` code and concatenate the results.
• Use `substring()` and `upper()` for the final result.

The function for the above algorithm:

``````CREATE OR REPLACE FUNCTION random_text_md5(length INTEGER)
RETURNS TEXT
LANGUAGE PLPGSQL
AS \$\$
DECLARE
-- how many md5's we need to have at least length chars
loop_count INTEGER := CEIL(length / 32.);
output TEXT := ''; -- the result text
i INT4; -- loop counter
BEGIN
FOR i IN 1..loop_count LOOP
output := output || md5(random()::TEXT);
END LOOP;
-- get the substring for the exact number of characters
-- and upper them up
RETURN upper(substring(output, length));
END; \$\$;``````

Nothing special, it just works. What about the timing? The timing will be a little bit later.

## The SQL Version

Pavel wrote a very interesting comment about using a pure SQL function. I’ve changed his function a little bit and the new function looks like this:

``````CREATE OR REPLACE FUNCTION random_string_pavel(int)
RETURNS text
AS \$\$
SELECT array_to_string(
ARRAY (
SELECT substring(
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
FROM (random() *26)::int FOR 1)
FROM generate_series(1, \$1) ), '' )
\$\$ LANGUAGE sql;``````

The timings for the new function were added to the below table. The chart has been updated.

Pavel posted three other versions of that function. The code is in the comment below, so I won’t place it here. I’ve run the tests once again for all the functions. The below data and charts have been updated.

My `md5` version is still the best.

## Combined md5 And SQL

Pavel also suggested combining the `md5` version and the `buildstr03`. I’ve created the below version:

``````CREATE OR REPLACE FUNCTION random_text_md5_v2(INTEGER)
RETURNS TEXT
LANGUAGE SQL
AS \$\$
select upper(
substring(
(SELECT string_agg(md5(random()::TEXT), '')
FROM generate_series(
1,
CEIL(\$1 / 32.)::integer)
), 1, \$1) );
\$\$;``````

This is the best solution so far, thanks Pavel. It is twice faster than the old `md5` version.

## Final Results

length of text simple simple_1 simple_2 simple_3 md5 md5_v2
1 0.013 0.010 0.011 0.011 0.023 0.039
2 0.013 0.011 0.012 0.012 0.018 0.033
10 0.030 0.024 0.023 0.023 0.017 0.034
50 0.115 0.087 0.081 0.069 0.020 0.036
100 0.223 0.168 0.153 0.127 0.025 0.041
1,000 2.244 1.677 1.496 1.172 0.106 0.098
10,000 39.885 34.677 32.353 11.672 1.462 0.672
30,000 217.746 199.434 193.639 35.028 7.448 1.962

And the chart for this.

## Final Notes

• `PL/pgSQL` is not very efficient, but can be easily improved
• functions are not inlined and calls are expensive (but inlining by copying is not the best idea)
• the fastest way is to use some functions written in C
• I think the fastest way would be to use some function written in C
• Interesting thing is that the last two functions scale linearly when the text size increases.
• Pavel’s suggestions with the `SQL` function has the same timings as my `random_test_simple_3`
• Another Pavel’s suggestion resulted in the best version `random_text_md5_v2`
• The best solution `random_text_md5_v2` is 12 times faster than the best `PL/pgSQL` solution.