20,000 Leagues Under ActiveRecord
First published in 1870, 20,000 Leagues Under the Sea
describes an underwater adventure that takes place
onboard a submarine called the “Nautilus.”
This is the first of a series of four posts based on a presentation I did at
the Barcelona Ruby Conference called “20,000 Leagues Under ActiveRecord.”
Let me start with a question: How many of you have actually read 20,000 Leagues
Under the Sea, either in the original French or in some translation? [ A few
people raised their hands, but most people in the audience did not. ] Yes, I’m
not surprised. 20,000 Leagues is one of those classic novels we have all heard
of, but few of us take the time to read.
While thinking about this presentation over the Summer, I decided to actually
read the novel - and I’m glad that I did! It blew my mind on a number of
different levels. Today I’m going to take the time to tell you a bit about the
novel itself as we go.
However, actually I’m here today to talk about ActiveRecord. I’d like to explore
how ActiveRecord works internally; how it finds the information we ask for and
returns it to us as a Ruby object.
But why talk about ActiveRecord? We all know how to use ActiveRecord; most of
you understand exactly what this line of code does. You didn’t need to come to
Barcelona to learn how to use ActiveRecord.
The reason why I want to discuss this is that ActiveRecord hooked me; it
first got me excited about using Rails. I came across Rails back in 2008, about
six years ago now. And I can still remember the moment when I first typed a
line of code similar to this one into the Rails console or into a Rails app
somewhere. It was amazing! Before that I was using PHP or Java - I’ve since
blocked out all memory of that so I’m not quite sure which it was :) - and when
I saw how easy it was to use Rails to query a database I was very impressed.
Captain Nemo takes Professor Aronnax, Conseil and Ned Land
on a hike through an underwater “forest.” Verne’s novel has
many beautiful, detailed descriptions of underwater worlds
he had never actually seen, and could only imagine.
Somehow Rails could take a simple, beautiful and readable line of Ruby code and
convert it into a SQL statement like this. How could this possibly work? It
seemed like magic!
But it’s not magic. There are thousands of lines of code and years of computer
science research behind this simple line of code, the hard work behind the
scenes that makes ActiveRecord queries possible.
Just as Professor Aronnax went on an underwater adventure with Captain Nemo,
I’d like to take you on an adventure inside of ActiveRecord to find out how it
works, how ActiveRecord generates and executes SQL statements.
And why stop there? Later we’ll dive underneath ActiveRecord and inside an
actual database server to find out how it works too. How does it understand the SQL
statements we give it? How does it find the data we ask for and return it to
us? Let’s find out!
Here’s our plan: First we’ll start with a look at ActiveRecord::Relation, the top, public API for
ActiveRecord many of you use everyday. Then, we’ll look deeper inside of
ActiveRecord to find out how it converts our Ruby queries into SQL statements.
Later, in the second half of the presentation we’ll dive even deeper and
directly into an actual relational database server (RDBMS); today I’ll use
PostgreSQL as an example. How does Postgres understand the SQL statement that
ActiveRecord sent it? How does it find the data we want? Finally, if we have
time we’ll look at the B-Tree Algorithm, which is part of the real magic that
allows database servers to work.
Let’s use my line of code from earlier as an example - today we’ll be searching
for Captain Nemo together. Therefore, we’ll start with a User class, a subclass of ActiveRecord::Base.
When I call where what happens? It turns out the
where method is defined in the ActiveRecord::Querying module:
But as you can see, ActiveRecord delegates the where
method over to another method called all, which
returns a new instance of a different class called ActiveRecord::Relation. In fact, my call to User.where is entirely equivalent to calling User.all.where:
ActiveRecord actually implements the where method
using the new instance of ActiveRecord::Relation.
Internally, where is implemented by the ActiveRecord::QueryMethods module, included into ActiveRecord::Relation. Next, ActiveRecord::QueryMethods#where returns, in turn, a second
new instance of ActiveRecord::Relation:
If you look at the right, you can see the second ActiveRecord::Relation object contains information about
what record we’re looking for, that we want the records where the name is
Of course, we don’t want all of the Captain Nemo users; we just
want the first one. Next, we call the first method on
the new ActiveRecord::Relation:
The Internal Implementation of ActiveRecord::FinderMethods#first
Now you can see ActiveRecord creates a third instance of ActiveRecord::Relation - this time with even more information
about the query we’d like to execute.
But what are all those other values: order, limit and offset? Where do they
come from? We can find out by taking a look at the implementation of first. The ActiveRecord::FinderMethods module implements this, which
ActiveRecord includes into the ActiveRecord::Relation class.
Here you can see because we didn’t pass in a value for limit, ActiveRecord
calls find_nth and passes in a value of 0,
indicating we want the first record from the query result set. The second
argument, offset_index, turns out to have a value of
zero, meaning we want to count into a window of records at the beginning of the
result set, not a window located somewhere farther along the result set.
ActiveRecord implements a series of similar methods that will return the
second, third, fourth or even fifth record:
You can see the pattern here; the first argument is a zero-based index
indicating which record we want. And just in case we want the forty second
record from the result set, ActiveRecord implements this useful method... :)
It’s no joke! forty_two is actually in the
ActiveRecord source code; you can try it for yourself. Replacing first with the equivalent call to find_nth, here’s our example again:
Following the code path through the ActiveRecord::FinderMethods module, we can see find_nth calls, in turn, find_nth_with_limit:
Now the arguments have been reversed; the first argument, 0, is now the offset
and the second, 1, is the number of records we want, or the limit value.
Substituting one more time, let’s replace find_nth_with_limit with more detailed calls it makes to
order and limit:
Now you can see where all of the values in the final ActiveRecord::Relation object come from. Each call to a
scoping method saves a new piece of information about our query, and returns a
new instance of the ActiveRecord::Relation class.
(We’ll see what arel_table means in a minute.)
The Beauty of ActiveRecord::Relation
Taking a step back, we can see that our simple line of code, User.where(name: "Captain Nemo").first, is creating a
series of ActiveRecord::Relation objects like this:
Captain Nemo allowed Canadian harpoonist Ned Land
to leave the submarine for a short time and explore a
tropical island off the coast of Papua New Guinea.
There are two interesting and beautiful things about this, I think. First,
notice the pattern that ActiveRecord uses: each call to a method returns a new
instance of the class that implemented that method. This is what allows us to
easily chain together different method calls. We can add on as many different
scopes as we wish; because each new object is also an ActiveRecord::Relation, it implements all of the same
methods. You can use the same pattern in your own code. All you need to do is
create a new instance of the class that implements each method, and return
that. One reason to study internal code like this is to learn about and find
new ideas that you can use in your own code.
The second beautiful thing about ActiveRecord::Relation is that it’s lazy. Using this chain
of method calls we are building up metadata or information about our query,
without actually executing the query itself. It’s almost as if we were using a
functional programming language like Haskell or Lisp. Using this trick,
ActiveRecord allows us to specify exactly the query we want, without having to
worry about executing it until we’re ready.
It’s not until we call the to_a method, in other to convert the
ActiveRecord::Relation object into an array and access the result set, that
ActiveRecord executes the query:
You can see here that to_a calls load internally,
which later calls the DatabaseStatements#select_all
method. Note the find_nth_with_limit method calls
to_a, so first,
second and forty_two are
not lazy and will all execute the query immediately. Because of this, these are
known as “terminating methods.” To prevent the query from executing immediately -
to keep it lazy - just use order and limit instead.
In the next few days I’ll post the second part of my presentation from
We’ll look at what “Relational Algebra” means and how the Arel gem converts our
ActiveRecord::Relation object into a string
containing a SQL statement.