Why Use ORMs?

Thu, Feb 25, 2010

by Szymon Lipiński

tags: database orm postgresql programming sql

ORM aka Object Relational Mapping is quite a crappy way of converting a relational database model into an object model. Usually this means that there is a class for each database table.

Many people say that ORMs are the best human idea since the sliced bread. I mention some of their arguments below.

Database Agnosticism

So you can switch databases? Well… No. Just no. I don’t know any advanced software that can easily switch databases. Even if it uses an ORM. The reason is simple: advanced software means advanced procedures for managing data. It means that everything is tuned up. If not, then it becomes slow.

Tuning the database means using its capabilities (beware: those capabilities are not database agnostic).

Database Agnostic Speeding Up?

What can you do with such a slow database? Of course you can create some indices (if your database supports it).

You can also do much more things including:

views (if your database supports it)
materialized views (if your database supports it))
triggers (if your database supports it)
replication (if your database supports it)
partitioning (if your database supports it)
anything else (if your database supports it).

Indices

Let’s talk about the indices. OK, first of all a question: what indices do you want to talk about? Maybe about PostgreSQL partial indices or functional ones? So if you use PostgreSQL, and create a functional index, then you cannot reproduce the same on Oracle. That way you have much faster PostgreSQL database, but not Oracle one.

The opposite example: in Oracle you have hints, in PostgreSQL you don’t. Of course a hint is just a comment so any other database would just ignore that comment. But ignoring doesn’t mean speeding up anything.

What can you do to have the same speed for every database engine? Just create a different schema for each engine. And while changing that schema don’t forget that you need to maintain two (or more) schemas. Adding a column to one schema means adding the column to another. Speeding up one engine… means speeding up the other (but maybe using different tools). And one more thing… after making all those changes don’t forget to test everything… separately for each engine.

Have fun.

Of course you can have a nice database agnostic application, but only if you create separate data layer for each engine. Just more work and the question is… what for? Do you really don’t have any more important work to do? Simply: use one database engine, believe me, 99.999% of software won’t change the engine once it works.

But do it wisely, have a data layer, which could be replaced by an implementation for another database. Having tests is quite obvious.

Speeding Up Development

Maybe… at the very beginning. But suddenly at some point (and that suddenly shows up even before the first release) you will find that there are so many limitations of the ORM libraries, that you have to make some queries using normal SQL. Quite soon you will have a hybrid system, part of the queries is done using simple Hibernate selects, some using HQL (what really is a kind of SQL), and some using SQL.

Is that really so easy to maintain and speeds up anything?

Testing

“As a consequence of being database agnostic, you can replace your heavy SQL Server with a nimble Sqlite instance at test time, allowing you to write blazing fast tests.”

– http://blog.adamjcooper.com/2010/01/are-orms-solving-anything-yes.html

I love this kind of legends. Last time I had to change something in the Oracle session using ‘ALTER SESSION’. Tests failed just because they were using HSQLDB. Everything would work if there would be used Oracle, not any other engine.

If you have an application that is prepared to be run using XXX database engine at version YYY, just use this YYY version of XXX engine for testing. Using anything else won’t work. You’ll just have some other engine tested, not the engine that you’ll use.

Caching

Yea, ORMs can cache… true. But if you don’t use ORM, you can cache too. I hope you don’t think that the whole idea behind any ORM is just caching.

Higher Level of Query Abstraction

Such abstractions can be dangerous, if you don’t know what is going on underneath.

Some time ago I was working on a project where a programmer made a query like SELECT * FROM XXX and loaded the whole table into the program. Then the program sorted the whole dataset copied from the database. Then it took the first 10 rows to show them on a web page. Of course it run smoothly on his machine. On testing too. 100 rows is not much.

On the production it was different. There was much more rows and it wasn’t funny. His explanation was something like:

I don’t know SQL, I’m not gonna learn that.

I wrote the correct code and I’m not gonna change that, it simply works.

He even wrote his own function for sorting a list.

You Don’t Have to Use the Crappy SQL…

Yea… I really don’t get it. Why so many programmers don’t want to know a little bit more? They decline learning new things including the SQL. They don’t have to know SQL, they’ve got ORMs. So they produce crappy ORM generated queries that usually are not maintainable.

… But With Pleasure Will Use The Crappy HQL

The most funny is The Hibernate. Mr. Hibernate refused to use SQL, they invented HQL. The best thing for HQL is that programmers don’t decline learning that… despite the fact that it looks much more like SQL.

How can I write an HQL query in Hibernate with the Oracle hints? There is no way to do it. It’s not that the hints are good, the problem is that instead of using SQL, you just use some crappy not fully working libraries.

ORM is For Everything

ORMs are also used for checking the data validity and all constraints including the uniqueness of a column. That is mainly crappy because this is crappy and simply doesn’t work.

Checking constraints at the application side and not on the database side, has some advantages such as simply decreasing the load on the database server. The problem is not the load but the data correctness.

If you need to have correct data in the database, some constraints are a must. If don’t care about the correctness, you can stop reading here. Good luck. You are gonna need lots of luck.

Checking the constraints at the application side sometimes works. However, what in case when you have another application written without your ORM mapping? That application can use some other ORM library. Then you have to create and maintain the mappings in two places all the time (a terrible idea). What if the another application is a simple perl/bash/python script, which doesn’t use any ORM? Such a script can easily change the data so that there is just a pure garbage but usually you realize that too late.

Additional points for brevity

if you will enforce column uniqueness only in the application.

Uniqueness declared at the application level either doesn’t work or you will need to lock the whole table before any data change (which will make it terribly inefficient).

So what to do about that? Just create constraints in the database. All applications and DBA (which is not an application) will be happy.

Database Constraints.

I wrote that such application constraints checking sometimes doesn’t work. There are mainly 3 solutions, the not working solution aka Rails, the not efficient solution aka MyISAM, and the good solution called The SQL Database.

The Bad

The first solution is used e.g. in Rails' ActiveRecord. Simply define a unique model attribute and let the application check that. During insert or update the model class checks if there is any record in the database that has this value in the unique column.

It simply doesn’t work. Let’s imagine that there are two instances of the same application. Both want to insert a row with the same data. The first application checks and finds out that there isn’t any such row.

The second application checks in the same moment… there isn’t too (yea, what a surprise). Then the first application makes the SQL insert. The second application too. After that there are two rows with exactly the same data in the unique column. What a surprise. How has that happened? Two users with the same logins, huh?

The Inefficient

The second solution is to perform the same operations as in the first example but inside some critical section. Simply create a lock in the database, so the first application is sure that no one will check if the record with the data exists. That will work great… but such a lock on the whole table is not a good idea. It blocks all writes so no one can change anything. That’s really inefficient, please don’t do that if you can do it the other way. I called this the MyISAM way as MyISAM is the widely used non transactional engine for MySQL and sometimes setting such terrible lock is the only way to create so called transaction simulation.

The Good

The third solution is the best. Just define a unique constraint in the database and take care of the error that you can get while inserting. It just works. It is just simple. There are no terrible locks. And there really will be just one record with this value in the unique column.

The Model Should Be Based On Classes

Why? Oh why? Just because you use an object based language? I’d like not to complicate things too much and don’t add a leaky abstraction level called ORM.

Why Not To Use ORMs?

You use some expensive database, let’s say it is Oracle. Oracle has hints, but HQL doesn’t allow to use that, oops. Oracle has spatial queries, but your ORM does’t understand that. Oracle has -whatever- but your ORM doesn’t allow to use that.

So you pay a lot of money for a database like Oracle just to use that like you’d use SQLite?

What a brilliant idea.

You use some not expensive database, let’s say it is PostgreSQL. In PostgreSQL you can define your types, but the ORM doesn’t understand that. You’d like to use the partial indices but your automaticaly generated ORM DDL queries don’t understand that.

So you don’t pay a lot of money for a database like PostgreSQL… just to use that like you’d use SQLite?

What a brilliant idea.

The engine XXX has a feature YYY but your great ORM doesn’t use that because other engines don’t have that feature (although using this feature would speed up your application so much).

Why Define Database Constraints?

You’ve got an application that works for some time. You want to change it, what should you do? Fix the old application or create a new one. What will just be left or migrated?

The Data.

The Data is what describes the real value of the whole application. So when there is a new version of the application (no matter if this is a brand new one, or you’ve improved the old), you just throw away the old. The data persists, you leave the data or migrate with care. So take care of the data by having a good data model with all constraints etc. If you don’t have a well prepared data model, you’re on the road right down to the hell. The data will be wrong some day. Let’s imagine that the wrong data is on you bank account. When you have a well prepared data in the database, there is no sense in repeating the same constraints in the application. You would have to maintain that in two places. So the application slowly becomes a maintanance nightmare.