2004/05/24 07:06: Annoyances with ActiveRecord

I’ve been using ActiveRecord for a while now. I’m writing a financial application with it — which I suppose is the canonical example of a database application that SQL was designed for. Specifically:

All of which makes ActiveRecord break into a cold sweat.

ActiveRecord, at the moment, handles 1:N relationships with N+1 queries — one for the main record, and one for each sub-record, indexed by ID. That gets really slow when you have a thousand records to display.

ActiveRecord, since it reads the column names from the database system itself, can’t automatically infer the presence of a virtual column such as an account balance, which is calculated as the sum of a field from a joined table. In fact, there’s no way to do that at all without dropping back to SQL, and there’s no place to place a join in the standard methods. Meaning you have to jump under the hood, into the greasy parts of the engine.

The temptation to say def balance transactions.inject(0.0) { |acc,e| acc += e.amount } end is really tempting, but it’s too slow.

My other problem is how it doesn’t integrate with Ruby that well. It doesn’t feel like the standard library.

ActiveRecord takes over inheritance: ActiveRecord classes must inherit from ActiveRecord::Base, and worse than that, it treats indirect subclasses differently: if I have an inheritance of Payment < Transaction < ActiveRecord::Base, then it assumes there’s a column type, which will contain a value NULL or the string Payment, referring to whether to instantiate the parent or child type — instead of using a table payments by default, and letting me change that. This integrates really poorly with PostgreSQL’s way of doing object orientation, and in this case, a Payment is a kind of Transaction, but the difference is semantic, not coded into the database. I’d like to be able to inherit the behaviors from the parent — not have them change on me because I’m inheriting.

ActiveRecord is also very hard to debug without intimately knowing its structure due the the massive amount of eval magic involving strings calculated elswhere in the code.

ActiveRecord does a decent job of type guessing, but only for basic types and classes-as-tables, not for anything resembling a derivative of a basic type.

So, what I’d do to ActiveRecord, and may do when I get this app a bit further along:

Take this with a grain of salt. I’m working with a pre-existing application here, porting to ActiveRecord. I’m working with data with One True Way, according to six hundred years of accounting theory, to be represented. I also have a habit and a fondness for taking systems and bending them past the point where they’re still flexible. It’s why I use Ruby.

A response from David Heinemeier Hansson

Ahh, great to get a set of concerns like that. Allow me to elaborate my position on each of the charges.

1:N relationships with N+1 queries

This is exactly why Active Record allows for an intimate relationship with SQL. And why it uses SQL for conditions snippets and more. Active Record really doesn’t want you to forget about SQL because SQL is exactly the right answer to a lot of questions. This is one of them. Active Record takes the drudgery out of doing manual SQL for a ton of queries (in Basecamp it’s around 85-95% of the queries that are automated).

So when you encounter a case such as this, you just drop down to SQL. If AR gives the intention that SQL is fooling around with the engine that’s were the mistake is being made. It’s more like choosing manual gears on a car that runs on automatic rather that popping the hood. So I think this is mostly a case of managing expectations. Active Record shouldn’t give the expectation that you’ll never have to touch SQL again. That’s an illusion and many ORMs have been build and crumbled on that illusion.

Taking Active Record to a manual gear on this entails using all the wonderful flexibility put into the ORM through the power that is Ruby. The solution is called “data piggybacks” and works like this: class Post < ActiveRecord::Base def self.find_all_with_authors find_by_sql( “SELECT posts.*, people.name as author_name ” + “FROM posts, person ” + “WHERE posts.author_id = people.id” ) end end Now you can do: for post in Post.find_all_with_authors puts “#{post.title} was written by #{post.author_name}” end

This bends the regular restriction that post objects will only have attributes equal to those in the table definition for that class. Hence the piggyback term.

This is of course a simple example, which you can make more elaborate as your requirements dictate it. But it does show how you can turn n*2 (or n*3 or n*5) type queries back into n. This is mostly important for overview screens that needs to present this data together.

I could imagine some formal support in Active Record for the piggyback approach, but I’ve found that outside toy examples, like the one above, I really do like to drop to SQL for queries like this. The only reason you’re going to SQL is to tweak performance, so it’s often you more interested in control than convenience in these cases. But again, I’m certainly open to formal piggyback schemes.

ActiveRecord takes over inheritance

It was an early decision that ActiveRecord objects wouldn’t be able to live in isolation from the ORM. This follows directly from the Active Record pattern and is one of the main differences between that and the Data Mapper approach. By tying your business objects to Active Record you loose flexibility in swapping in another ORM, but at the same time gain a incredible ease of use and convenience. Active Record objects accepts this dependency by using the association/aggregation macros (has_many, etc), overwriting callbacks (before_save, etc), and implementing validation.

So in my opinion there’s little to gain from a mix-in, when you’re writing code that formulates a specific dependency anyway. Once you’ve accepted that, there’s no way to “mix-out”.

What I’d rather mix-in would be the shared functionality between payment and transaction if you don’t want it to use an explicit inheritance hierarchy. That was actually exactly what I did with todo lists and todo items on Basecamp. Both needed functionality to control list movement (move this item/list higher or lower or insert a new item/list). This behavior was kept in a List module that was then mixed into both Active Record classes. Worked very well.

ActiveRecord is also very hard to debug.

I recognize that especially the association/aggregation macros are pretty hard to debug. But you’ve yourself remedied that with the module_eval debugger where you get a perfect view into which methods are added by the different macros and what their internals look like.

The other parts of Active Record is more easily debugged by having a look at the logging that ships with the package. All SQL statements generated and executed by Active Record are logged and their run time is reported (poor man’s profiler).

The newer versions of Active Record also implements a considerably improved exception system with more saying and relevant exceptions raised.

ActiveRecord does not [do a decent job of type guessing] for anything resembling a derivative of a basic type.

This just changed with the inclusion of the aggregation module, which brings much needed value object support to Active Record (allowing for a “fine-grained” label). You can now use Active Record objects as aggregations of value objects such as Temperature, Money, Address, and others that doesn’t need an identity (such objects are often called entity objects, which is exactly what Active Record-based objects are).

Example (as it will be in the final version): def Account < ActiveRecord::Base composed_of :amount, :class_name => “Money” end

The amount attribute, which is just represented by an integer in the database, will now be presented as a Money object and can be assigned as such: account.amount = Money.new(20, “USD”) account.amount.exchange_to(“DKK”) # returns Money.new(150, “DKK”)

Take this with a grain of salt. I’m working with a pre-existing application here

Active Record is certainly better suited for new applications than for wrapping around existing structures. The naming, id, and class schemes work much better when you’re following the form that Active Record is most comfortable with. So again, trade a bit of flexibility, get ease of use.

It would however not be unreasonable to suggest improvements to Active Record that would allow it to be more easily used for legacy applications. It’s just that I’ve certainly picked sides and tweaked ease of use towards new applications, so suggestions that this awkward probably won’t be acccepted.

Thank you, Ari, for bringing these concerns out. This allows me to better formulate what Active Record is and isn’t and why.

(I’ve posted this response to Loud Thinking as well)

David Heinemeier Hansson,
http://www.instiki.org/ — A No-Step-Three Wiki in Ruby
http://www.basecamphq.com/ — Web-based Project Management
http://www.loudthinking.com/ — Broadcasting Brain
http://www.nextangle.com/ — Development & Consulting Services

A response

All very good points. When it comes down to it, it’s a difference in expectations, mostly, and in coding style.

Debugging is still something that keeps biting me — and since ActiveRecord isn’t a perfect fit for what I’m doing, I do have to get into the guts. I think of modifiability for other purposes an important design feature, so that’s why it bothers me.

Having the name ActiveRecord makes me think it’s the One True Implementation of the pattern — the one-size-fits-all, ultra-flexible version. Just as I’d expect Enumerable to be the one implementation of Enumerable, I expect the same of ActiveRecord. You choose an ambitious title.

All told, ActiveRecord is solidly implemented. What it was designed to do, it does well.

I look forward to the _with_parts accessors and the composed_of macros. They’ll solve some of the problems neatly.

More coming.

Responses? Comments? Email me. I’ll post them here if you say I can and they’re reasonable.