This story is a just another example of what I call
Using varchar(255) for everything won’t make you a sorcerer.
Every database has many different types. They are there on purpose, not by accident. I’ve seen that too many times: a programmer (most companies don’t have DBAs for that) creates a database strsucture using only ONE type. Sometimes they use more… but hardly ever they use all those nice features they database engine provides.
This is a common pattern:
Pay thousands of dollars for the database licence – use it like a CSV file.
The UUID Type
PostgreSQL has a very nice type UUID. This can used for storing UUIDs. Uniquness of UUIDs is another nice topic to write about, but not this time.
How would you store the UUID in a database?
In PostgreSQL solutions are maily two:
How do you think: which one is better?
Of course the UUID type isn’t there just for fun. How do you think, would anybody do that just for fun and that all those people just accepted the patch for this?
A Simple Test
Let’s create two similar tables:
CREATE TABLE u ( uuid uuid PRIMARY KEY, payload character(300) ); CREATE TABLE t ( uuid text PRIMARY KEY, payload character(300) );
Payload is here only to have the rows much bigger.
Fill the first table with randomly generated UUIDs:
INSERT INTO u SELECT uuid_generate_v4() FROM generate_series(1,1000*1000);
and fill the same uuids to another table:
INSERT INTO t(uuid) SELECT uuid FROM u;
Let’s update the statistics for the tables:
vacuum analyze u; vacuum analyze t;
What about size of the tables?
SELECT pg_size_pretty(pg_total_relation_size('u')); pg_size_pretty 82 MB (1 row) SELECT pg_size_pretty(pg_total_relation_size('t')); pg_size_pretty 138 MB (1 row)
Compare The Speed Difference
Let’s check the speed of queries. Test procedure is simple:
- get 1000 of UUIDs from the table
- get 1000 of random UUIDs
- query for those 2000 UUIDs each table
- for better results: make the above step 200 times
It turned out that the time difference is really insignificant. Those two tables are almost equally fast.
So, What is The Difference?
- Size matters. In case of the text table, I wasted 68% more space. It means 68% more RAM space wasted when PostgreSQL loads the file pages into memory. More memory wasted = slower database.
- With the TEXT type I can insert data like
bbbbwhich are invalid UUIDs. That can hurt the application when you read the data, pass the read value from database to an UUID class constructor. Suddenly you will get a strange error.
- For getting the data from the database, you will have many different formats available in the TEXT mode. In UUID mode, you get only one output format. To get consistent output format, you have to make by hand conversions - too much overhead.
UUID type gives you cure to all those probles for free. You can have many different input formats and only one output format. So there is no need of hand made checks, conversions. You will get no errors when converting the database UUID value in the application to a UUID object. You have really smaller memory requirements for storing the value (yes, you will notice the difference when you have a lot of UUIDs and a database which doesn’t fit in the available RAM).
And now a couple of nice links: