Are Long Running Transactions Bad?
This transaction has been open since July 2, 2001.
We call it “Edward.”
– http://thebuild.com/presentations/10-ways-to-kill-performance.pdf page 33
Some time ago I was asked about long running transactions. Is that really so bad and how long should a transaction run? I will try to reply that question now, as it returns all the time. The answer is not so trivial, as usually when you ask about databases.
SQL Transactions
It will be a very simple introduction to database transactions. I assume that the transactions operate in the “normal” way. It means that the transactions can see only the consistent state of database from the moment of the start of the transaction. Further description of all types of transactions can be found in the excellent PostgreSQL documentation.
Database transactions are used for the all-or-nothing principle. You can perform many different operations in a transaction. If you end the transaction using the COMMIT command, then all the changes are stored in the database. If you use the ROLLBACK command, then none of the changes is stored. It is impossible to load only a part of the transaction changes to the database.
For me the most important and the most nice PostgreSQL transactions feature is that it has full transactional DDL. You can create an index in a transaction, run many queries and notice that the index is used. You could then rollback the transaction and the index will just magically disappear. What’s more other transactions won’t ever see that index.
The same is for almost all other PostgreSQL operations. You could just create a bunch of tables in a transaction, if you have made an error somewhere, the transaction will be rolled back. You could just fix the error and run that once again and don’t think about the tables which could be created earlier.
Why Long Transactions are Bad
Too Much Old Garbage Here
Normally transaction can see a snapshot of the data from the beginning of the transaction. If it can see the data, the data should exist in the database. It means that if there is a row with some data, you start two transactions, and one of them changes the row, the second transaction should still see the first version. This causes the need of storing multiple versions of the same row in the database.
In PostgreSQL there is the autovacuum process (you could easily run that also by hand) which clears unused data. This cleared unused space could be later used for new data, so the database could stay at the same size for a long time.
Imagine that you have a very long running transaction, let’s take just two weeks. That transaction started two weeks ago. What does it mean for the database? It means that this transaction should see the database state as it was two weeks ago. Autovacuum shouldn’t clear anything for the last two weeks, as it could be needed here. The database size is growing and growing. The indices are bigger and bigger. The overall database performance drops silently.
Transaction Number Wraparound
Each transaction in the database gets its unique number. The transaction number generator is cyclic and one day you could get the warning:
WARNING: some databases have not been vacuumed in over 2 billion transactions
DETAIL: You may have already suffered transaction-wraparound data loss.
Warnings are usually totally ignored… until the day comes. The day when suddenly the database shuts down with an error. The error says that the transaction-wraparound will occur right now and that could destroy some data, so to prevent data loss, database will just shut down. After starting the database everything will work normally.
The only thing you’ve probably not noticed is that the long running (or rather most probably long hanging) transaction is not there any more. Now vacuum can clean old row versions marking part of the data files as reusable. Unfortunately that won’t decrease the database on disk size.
You can decrease the data file size with full autovacuum. It locks the whole table for reading, so no client can use the table. This can be really destructive on production servers.
Locks
Transaction can hold a lock. When a transaction updates a row, it locks it. Another transaction, which wants to update or delete the same row, will hung until the lock is released. Such automatic locks are released only at the end of the transaction (it doesn’t matter if the transaction commits or rolls back). A long transaction which locks something can damage the whole efficiency as all other transactions could wait for this lock to be released.
How Short a Transaction Should Be?
This cannot be simply answered. It depends. This is my favourite answer on most questions about databases. It just depends. Sometimes there is a huge operation where the transaction lasts for a couple of days. That is OK, but only and only if it is needed.
Howver, if you have normal operations on a database and normal transactions last for a couple of seconds or minutes. And suddenly such a normal transaction hangs for a week, you can be sure there was an error in the application. The transaction should be removed from database and most probably you will have a really huge efficiency problem in the next couple of weeks.