Basic Mistakes in Database Testing
There is a huge number of books about TDD. Those books usually concentrate on using tests for units of work. Units of work are understood in many different ways, usually it means a class.
Those books usually say:
Write a lot of tests, make code in such a way thats the tests pass.
All external resources should be mocked so you can test only the one unit.
That’s cool and there is nothing wrong in this TDD approach. Unfortunately in many cases all of the testing stops at this moment. There are some queries (written by hand or generated by some ORMs) but they are not tested, usually.
Some programmers test those using integration tests - tests which connect to a real database and perform real queries. This usually means testing only the happy path.
I’ve got my ORM – it takes care of everything.
I don’t need to test it.
I don’t need to think.
Database is usually the most valuable asset a company owns. Applications are rewritten from time to time. Old applications are thrown away, new applications are installed.
Nobody throws away the database full of data, when changing the applications.
The database is carefully migrated. There are systems where many different applications use the same database at the same time. That’s why it is important to have a good database model. A model full of constraints. That’s why why databases should be treated with care. You really don’t want to destroy the data consistency as this could cost your company too much.
This article is about the forgotten realm of the database testing. The realm of the integration tests using real databases. It really doesn’t matter what kind of database engine you are using. You can use PostgreSQL, MySQL, Oracle or even those funny noSQL databases like MongoDB. The rules described below can be applied to all kinds of databases and all kinds of applications. Maybe not all of them, as noSQL databases cannot enforce data integrity.
Your application usually consists of many different parts. There is some put here any language you like code, some configuration files, some SQL queries, some external systems. Testing an application means testing each part separately (only because that makes finding bugs easier) and testing how all the parts cooperate. Database is one of suchs parts and you should test it thoroughly.
Not Testing The Database
The first and the most terrible mistake ever made. Not testing database at all. You write some code using a database. You even make unit tests for those classes using some mocked database connection.
What about the integration tests? Tests should test the application using the production environment.
The only idea behind tests is to be sure that application works after deploying to production.
If you don’t test it on the production database, you don’t really know if it works. Your mocked connection allows you to send any query you want and it simply doesn’t check it. It just returns the data you want to return.
Not creating integration tests means you don’t really test your application.
Not Testing The Database Schema
Most teams I’ve ever observed used to have some kind of integrations tests. Usually a happy path tests: there is an ORM, we persist object, ORM does the work, that’s cool, I don’t need to do anywhing else.
I’ve never seen a team which tests the database schema. Imagine you need to create an index on the database because there is some slow query on production. You want to have this index and you want to be sure that next time you deploy that application on the new client’s environment, the index is there. Why not write a simple test to check the index existence?
There is much more things to test beside the indexes:
- the primary keys
- the foreign keys
- the checks – to be sure that column ‘price’ doesn’t have negative values
- the uniqueness of some columns - you really don’t want to have two users with the same login
Tests Don’t Use The Production Engine
When you develop an application, you can choose from a wide collection of databases. Usually you choose the one you think is the best depending on what the team knows best. It can also be chosen by management (sometimes using some strange reasoning). Sometimes the applications use different database engines at the same time. Sometimes applications are prepared to use different engines, so customer, who buys the application, can choose database he wants.
The chosen database engine really doesn’t matter for making goods tests – however it can make it harder to write good tests.
Programmers are lazy, so they want their tests to run very fast. They don’t want to wait for the results for too long. That’s why many teams use some faster in memory databases like HSQLDB. They are much faster because they are stored only in RAM and don’t touch any drives while operating. Running them is also much simpler: just add a library to your project.
There is one more rule often forgotten:
Tests should use the same database engine that is used on production.
Many programmers use some other engine. A common explanaition is simple: “our database is too slow, we should use some in memory database engine”. This is not a good idea. This way you test the other engine, not your production one, so in fact you don’t have your application tested.
I had this problem once. We had to optimize queries by settings a session variable after connecting to the database. That application used only the Oracle database on production. After setting this variable the tests failed. All the tests. It turned out that I cannot set this variable in HSQLDB, as it simply didn’t exist. The tests were using HSQLDB, the production was using Oracle. So I had to write a terrible code: check the database engine after connection and set the variable or not.
Even if you don’t have any problems with mixing engines, remember that when you test on other database engine, than you have on production, you just don’t have the application tested at all.
Tests Don’t Prepare The Database
Testing has a common, well defined flow. It is very simple:
- Prepare the environment
- Run a test
- Check test result
- Go back to the first point
Try doing something opposite and it can hurt you.
Can you see that there is no tidying after the test?
This is very important: tests need to prepare the environment before, not after the test. You cannot be sure that the test will be able to clean everything. Application could exit with an error, network connection could broke, or application could blow up (e.g. with an Out Of Memory Exception). It shouldn’t matter how the test ends, it really matters that the test is running with the same environment which is prepared for each test.
I noticed this mistake once: there were plenty of integration tests which were cleaning all the changes after each test. Many programmers were running those tests using debugger and were stopping it in the middle after spotting a bug. Any other test run after that had an unpredictable and random result. It was running on the environment changed by the previous test and wasn’t clearing the whole environment for itself.
Tests Prepare The Database And Don’t Check That
In the previous part I wrote a lot about preparing a database. I want to add one more thing. Preparing the database is not enough. When you prepare the database by clearing some tables, loading fixtures and so on… there is one thing left.
Check the database state after preparing.
You really need to be sure that you’ve prepared all that correctly. This could save your time later, when there is a bug caused by some data that was left and wasn’t cleared.
This should be a part of the pre-test database preparation.
Not Testing The Creation Scripts
Each application needs to have some kind of installation procedure. There is always the first time you deploy the database.
Programmers tend to change the database by hand using some ad-hoc DDL queries. They don’t write them down later or forget about them. They don’t update the installation scripts. Too many teams I saw don’t use any versioning scripts (e.g. migrations in Ruby on Rails or Liquibase in the Java world).
The best way to test that is to recreate the database before running test suite. You don’t have to do that before each test starts. Just run it once, before running all tests.
Not Testing Foreign Keys
Foreign keys are one of the basic ways of providing database consistency. In a good relational database schema you should always have the keys. If you don’t have, well, it could be a sign that you have a really big problem. This depends on the data model, but usually the lack of the foreign keys is a smell of a really bad design.
Testing foreign keys is simple. Just add some rows to a table without adding appropriate rows in referenced table first. You should get an error. Then you should delete rows from the referenced table, you could get error, or not (that depends on the key definition). You should check the expected behaviour whatever it is.
Not Testing The Default Values
In a good database design you should define some sensible default values. Usually those default values could be nulls. Those should be tested as well. You cannot assume that only your application will be changing data in this database.
- What if somebody wants to create a quick fix and updates some data using adhoc SQL queries?
- What if some day someone starts another application which can change the data and the new application won’t be using your ORM mappings or your DAO classes?
If you have stupid default values, or wrong default values, you could destroy the data and that could be even worse than a simple application bug. The bugs can be fixed. The broken data can sometimes be not fixable.
Not Testing Constraints
There are more constraints in a database than just primary and foreign keys. You could have columns which are unique or not null. You can constraint column to only small set of values. You could want to be sure that the price will never be lower than 0.
A good database schema should have many constraints. You should test them too. If you want your price column to have only positive values, what happens when you try to insert there $-1? Why not test that?
You cannot assume that only your well tested application will be using the data. These checks are your last line of defense against those bugs. Why not test if they work correctly?
Multiple Tests Can Change the Same Database
Usually the database tests change the database. You can run multiple tests symultaneously, but you have to be sure that none of the tests influence others. You must be sure that if one test writes something to the database, another test won’t read that.
Usually it is too easy to mess that up, so my small advice: avoid running multiple tests at the same time. This also means that you shouldn’t run the same test suites from multiple machines.
When you have many developers, who want to run tests, each of them should have a separate database. If you have some kind of read only database, that’s OK, multiple machines can use such a database at the same time. But if you allow for a situation where all programmers are using the same database for testing, you can have really unpredictable test results.
What does a programmer do when he spots an error in a test? Well, good programmer tries to fix that. If the test fails only because of another programmer who was running his tests on the same database, this is just a waste of programmer’s time to fix them.
There is No Big Red Button
Good programmers are lazy. Good programmers are more and more frustrated if you order them to repeat the same task all the time. Good programmers automate repeatable things.
In every project you have to deploy something on testing environment. How much time does it take? Do you really want to waste your programmer time for redeploying application and loading databases all the time?
That’s why each project should have The Big Red Button. The button a programmer can push, make some coffee, get back to work, and after a couple of minutes the programmer knows that the Big Red Button finished the job and everything is ready.
The Big Red Button is a real time saver. You may claim that it is too slow to automate all of that. Well, in fact it is not. It is quite opposite. It is like claiming that writing tests slows you down and that you could go much faster if you could just write code. Till the first bug. Till the first change which could change the behavior of some other part of the program.
It is slower at the very beginning but the more complicated the project becomes, the more time you save because of the existing tests or the Buttons. All sorts of Big Red Buttons – you could use them to deploy applications, run tests, and all other staff like that.
Sometimes Jenkins (aka Hudson) is used for that. Yes, this is a great software for such a Big Red Button stuff. The only thing is that each programmer should have his own set of jobs to deploy all the stuff on his own environment, where she can play with that not influencing others and not being influenced as well.
There are many tools for database testing. You could write simple integration tests for testing the whole schema. For PostgreSQL there is pgTAP, which can be integrated with Jenkins using the TAP plugin, so Jenkins can have a job which tests if a database (including the production database) is OK.