Skip to content


Why Ruby on Rails Migrations Don’t Work

Ruby on Rails Migrations

Migrations in Ruby on Rails provide some nice faetures for manipulating database structure. It is used mainly for two reasons:

  • this is a nice way of keeping database structure changes
  • you don’t need to know SQL
  • the generated SQL code can be used for many database engines

The first point is OK (this is a must, you really should keep such a track of database changes). The second would be nice…. but that simply doesn’t work. I don’t think that not knowing SQL is such a huge advantage. Knowing SQL is really a need when you write some database software using relational databases.

Migration Example

A simple example of a migration. One migration in one file, all placed in the db/migration directory in the Rails project.

class SampleMigration < ActiveRecord::Migration
  def self.up

  end
  def self.down

  end
end

The whole migration is in one class with two functions: up and down. Function up is used for migrating in the up direction, and down in the opposite. In the ideal world the down function should roll back all changes created by the up function.

Unfortunatelly that’s usually impossible (OK, that’s possible but only in really simple migrations and only sometimes), but about that later on.

Here is just a more complex example of a migration:

class SampleMigration < ActiveRecord::Migration
  def self.up
    create_table :people do |t|
      t.column :name, :string
    end
  end
  def self.down
    drop_table :people
  end
end

The up function creates a table named ‘people’ with one colum ‘name’ of a type ’string’. Additionally migration adds some more, but nevermind.

The down function just drops that table.

That looks quite nice and simple but of course creating or dropping a table is the simplest task. Normally you’ve got to manage more complicated tasks such as creating and fillind a column on a big table or create a new column from two existing and dropping those existing columns after that.

Problems with migrations.

No foreign keys

There are no functions for easy creating foreign keys. You can always add a foreign key using below migration but this is a real pain as you have to write your own SQL together with some Ruby code. I thought that migrations are for writing no SQL. If I have to write my own SQL for such a simple things then I’d rather do them on my own from the very beginning using just plan SQL.

class ExampleMigration < ActiveRecord::Migration
  def self.up
    create_table :products do |t|
      t.references :category
    end
    #add a foreign key
    execute <<-SQL ALTER TABLE products ADD CONSTRAINT fk_products_categories FOREIGN KEY (category_id) REFERENCES categories(id) SQL
add_column :users, :home_page_url, :string
rename_column :users, :email, :email_address
  end
  def self.down
    rename_column :users, :email_address, :email
    remove_column :users, :home_page_url
    execute "ALTER TABLE products DROP FOREIGN KEY fk_products_categories"
    drop_table :products
  end
end

No checks

The same problem is with checks. All the Rails folks agree that all database checks have to be done in AR models and not in the database. Another myth. There is nothing wrong in creating check in Rails models, that really helps writing information about wrong data to the user. But not checking all data in database is a problem similar to checking form data only using Javascript and omiting the check at the server side.

Some time ago I read a story about such an approach, all data validations were made using the model layer, no checks, unique indices or foreign keys in database. One day they had to give access to the database to some external company so they could make their own modules. They of course used just database with some different programming language so they had a problem with using the application model layer and all the checks. The same time the main company realised that their importer is rather slow so to speed it up, they just wrote some complicated SQL queries instead using AR classes. All were fine, the external module was working hard, the importer was faster… and after some time they realised that all data in the database are not valid. It was just garbage, not proper data. But unfortunately it was too late :)

No stored procedures

There is no api for creating a stored procedure or for using that. You just write a simple SQL and can run it using execute from the migration. Maybe not a big problem.

Funny unique constraints

Unique constraints are usually (as I saw in almost all migration examples that I found) set only in the AR classes and all of them are checked only in the model layer. No unique indexes in databases because they are not needed.

That can lead to some great problems. Let’s see this procedure, we have a table users where the login column is of course unique.

The procedure of adding a new user to database looks like this:

  1. Check if there is such a user
  2. If there is not, add it

Look good, isn’t it? But what if we’ve got two web page users who just want to have the same login? AR objects would do it this way:

  1. User A: Check if there is such a user (found that there is not)
  2. User B: Check if there is such a user (found that there is not)
  3. User A: add user with login ’simon’
  4. User B: add user with login ’simon’

And suddenly there are two user records with the same login (from the moment one can say that the database sucks… but more professional would be: sucks a lot… data is garbage so throw it away).

I can see only two ways of improving that without the unique index:

  1. Lock the whole users table before the step 1… (what causes great efficiensy problems).
  2. Make some other locks for inserting a record to this table (what causes some not so nice hacks around inserting a record).

What’s more it’s known behaviour, as we can read in the official Ruby on Rails guide:

„Validations such as validates_uniqueness_of are one way in which models can enforce data integrity. The :dependent option on associations allows models to automatically destroy child objects when the parent is destroyed. Like anything which operates at the application level these cannot guarantee referential integrity and so some people augment them with foreign key constraints.”
7. Active Record and Referential Integrity

This is where I fully agree, adding validate_uniqueness_of in the model can help to write nice error message for the user, but there is not other efficient and easy to set way of ensuring that all data in a column is unique other than creating a unique index.

No partial indices

In PostgreSQL where are partial and functional indices. Such nice things as:

CREATE INDEX i_users_active ON users (login) WHERE is_active = true;
CREATE INDEX i_users_active_lower ON users ( lower(login) ) WHERE is_active = true;

Unfortunately without hacks or custom SQL I cannot do that in migrations.

No special database types

Some databases can have their special datatypes or you can define your own. In PostgreSQL you’ve got e.g. hstore or arrays. There is no way to use those in migrations. Of course you can use just all the standard data types, but if you want to use some special, you’ve got a problem.

Migration up/down is atomic

Is atomic and is run in one transaction. The problem appears when you’ve got a huge (huge using this definition) database and want to update too many records. During such an update all records would be blocked for writing so it is rather stupid to make that on a huge, very used table. Example of such a query:

UPDATE adverts
SET valid_till = valid_from + '2 days'::interval
WHERE created_when < '2009-01-01';

So there is a table with adverts and we set all adverts added before the year 2009 to be valid for two days. What if there are 2 million of such adverts being changed all the time (for example by the importer)? Such a query can totally block all other queries waiting for releasing this query locks. What’s more, there is a huge chance that during this query execution there will be a deadlock so the whole operation could be started again.

Another solution is to lock the whole table, there should be no deadlocks then, but again all other queries would wait and all applications would wait and there would be a phone from the management soon… and we don’t like it.

Usually such a change can be made in small parts. Let’s say using 100 adverts at a time. The simplest query would look like this:

UPDATE adverts
SET valid_till = valid_from + '2 days'::interval
WHERE created_when < '2009-01-01'
  AND id in
    (  SELECT id
       FROM adverts
       WHERE created_when<'2009-01-01'
       ORDER BY id
       LIMIT 100 OFFSET ?
    );

In the ? parameter there should be inserted numbers 0, 100, 200 and so on. Using this approach the whole data change could be made in days or weeks, but that won’t affect all applications and won’t block them during the whole update process.

Unfortunatelly there is no way to do that using migrations.

Engine independency myth

All the Active Record api and migrations are created with the database independency in mind. Great. But that simply doesn’t work. There are to huge differences between database engines. There usually is a moment when you have to write your custom SQL query instead of an ORM generated one just for speeding that up. And such a query usually uses some special unique engine features such as for example partial indices or hstore/ltree types in PostgreSQL. After such a query, the application isn’t engine independent any more, so why do you think that it is important? I haven’t met so far any advanced and more complicated applications that would be engine independent. And if such an independancy is not really achievable, then maybe just don’t bother and use your db engine features from the very beginning. That includes creating stored procedures, triggers, checks, foreign keys and so on. So use migrations with custom sql all the time or write your own plugins for generating custom sql?

Some examples are terrible

In some examples of the migration usage there is shown how to use AR classes in migrations. But why almost all such migrations look like this:

class ChangeStatus < ActiveRecord::Migration
  def self.up
    Product.find(:all) do |p|
      p.update_attribute(:status, 10)
    end
  end

  def self.down

  end
end

The problem here is that during the UP migration, this function gets all records from a table, changes that and updates in database. All that could be done with a simple sql query such as:

UPDATE products SET status=10;

Usually when a programmer writes such a code, he or she tests that on a small, very small, developer’s database. Problems appear when such a code is used on a production server.

Schema dumping

Ruby on Rails migrations can make a schema dump into a file so that later when you want to create a new database, you don’t have to make all the migrations but only load this one file. That is nice… but a small gotcha here. Normally the database structure is dumped into a Ruby file. But there are only all those Ruby function calls without any custom SQL. If you want to have dumped all SQL needed for creating such a database, you have to change the dumping method and use db:structure:dump Rake task. This task will use the special program for the used database engine for creating a simple database dump. In PostgreSQL it would be something like:

pg_dump -s

No Slony Like Solution

Slony is a master-multi slave replication system for PostgreSQL. While using Slony I’ve got to run all DDL queries (DDL – Data Definition Language – so all thoser queries that change data structure) using a special script. That script executes that DDL queries on master database and later all those DDLs are replicated on slaves.

When all the database changes are generated by ruby classes from migrations, there is a problem with getting the SQL out of those Ruby files. David Wheeler has similar point of view.

Final Thoughts

I think of a database as a good data storage where all data is valid. All the time is valid. That’s why I need all those checks, foreign keys, triggers and so on. Ruby on Rails folks just treat database as a simple bag full of data, but when the application grows much bigger it looks like a GIGO = Garbage In, Garbage Out approach. Do whatever you want but you should know that when data in database is not valid, it’s just too late and you can throw such a database away.

Better Migrations?

The whole migration idea is quite nice, simply it allows keeping track of the database structure changes and allows keeping that in a simple version control system like subversion or git. The rake mechanism also is very nice. What I don’t like about migrations is stated earlier.

As I use mainly PostgreSQL, some time ago I started writing my own migrations for PostgreSQL. Some Rake tasks to load up and down migrations on a database, but all changes are stored in simple SQL files, no Ruby code inside. Of course that won’t be database independent, but who cares. I’ll publish that when will be finished.

Related posts:

  1. Playing Rails Migrations Last time I keep on playing Ruby on Rails, quite...
  2. Ruby, Rails… AWESOME I really don’t get it, why almost each post or...
  3. Datamapper and Ruby On Rails Aaaaaargh. Some time ago I wrote about problems with merb....
  4. How to check PostgreSQL privileges. Someone asked me how to list all tables that a...
  5. Why Use ORMs? Why Use ORMs? ORM aka Object Relational Mapping is a...
  6. What’s Wrong With Merb. Merb is going to be a new killer application used...

Posted in programming.

Tagged with , , , , , .


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.