Using LEFT OUTER JOIN with SearchLogic

LEFT OUTER JOIN queries are a great way to handle associations that may contain missing or null values. For example, suppose I have an application with a has_one/belongs_to association between books and authors.

class Book < ActiveRecord::Base
  has_one :author
end
class Author < ActiveRecord::Base
  belongs_to :book
end

If I want to identify the books that don’t have any authors, I can use this SQL statement:

SELECT * FROM books LEFT OUTER JOIN authors ON authors.book_id = books.id
WHERE authors.book_id IS NULL

I need to use LEFT OUTER JOIN here since a normal INNER JOIN query would only return books that have authors. Or if I want to sort the books by their author, I could use:

SELECT * FROM books LEFT OUTER JOIN  authors ON authors.book_id = books.id
ORDER BY authors.name

This sort would work even if some of the books didn’t have an author assigned to them. If we had used a normal INNER JOIN query the books with no author would have been dropped from the sorted list.

Today I’m going to discuss how to use LEFT OUTER JOIN with ActiveRecord and SearchLogic, allowing you to handle associations that might have missing records easily and cleanly.

ActiveRecord :joins

Before we go any farther, let’s setup the book/author example so we can explore how ActiveRecord handles joins:

$ rails outer_join_example
$ cd outer_join_example
$ script/generate model book title:string
$ script/generate model author name:string book_id:integer
$ rake db:migrate

And don’t forget to add the has_one/belongs_to lines to the two new models:

class Book < ActiveRecord::Base
  has_one :author
end
class Author < ActiveRecord::Base
  belongs_to :book
end

And now let’s create some books and authors we can use to test with:

$ script/console 
Loading development environment (Rails 2.3.5)
>> [ 'One', 'Two', 'Three' ].each do |title|
?> book = Book.create :title => title
>> book.author = Author.create :name => "Author of Book #{title}"
>> book.save
>> end

Let’s get started by looking at how you would sort the books by their author’s name, using a normal INNER JOIN query:

>> ActiveRecord::Base.logger = Logger.new(STDOUT)
>> Book.find(:all, :joins => :author, :order => 'authors.name')
       .collect { |book| book.title }
   Book Load (1.4ms)   SELECT "books".* FROM "books" INNER JOIN "authors"
                       ON authors.book_id = books.id ORDER BY name
=> ["One", "Three", "Two"]

Great – here using find with the :joins option we’ve told ActiveRecord to load all the books, and to join with the authors table so we can sort on the authors.name column. But watch what happens if I add a couple of new books that have no author yet, and then repeat the same INNER JOIN sort query:

>> Book.create :title => 'Four'
>> Book.create :title => 'Five'
>> Book.find(:all, :joins => :author, :order => 'name')
       .collect { |book| book.title }
   Book Load (1.8ms)   SELECT "books".* FROM "books" INNER JOIN "authors"
                       ON authors.book_id = books.id ORDER BY name
=> ["One", "Three", "Two"]

Books “Four” and “Five” are dropped entirely. This is because INNER JOIN only includes records in the result set that have values from both the books and authors tables. Since there is no author record for these books, they don’t appear at all.

ActiveRecord :include

The simplest way to get around this problem is to use ActiveRecord’s :include option, instead of the :joins option. Let’s rewrite that call to Book.find and use :include instead:

>> Book.find(:all, :include => :author, :order => 'authors.name')
       .collect { |book| book.title }
   Book Load Including Associations (2.8ms)   SELECT "books"."id" AS t0_r0,
"books"."title" AS t0_r1, "books"."created_at" AS t0_r2,
"books"."updated_at" AS t0_r3, "authors"."id" AS t1_r0,
"authors"."name" AS t1_r1, "authors"."book_id" AS t1_r2,
"authors"."created_at" AS t1_r3, "authors"."updated_at" AS t1_r4 FROM "books"
LEFT OUTER JOIN "authors" ON authors.book_id = books.id ORDER BY authors.name
=> ["Four", "Five", "One", "Three", "Two"]

Now we get books “Four” and “Five” in the sorted list; this is because ActiveRecord uses a LEFT OUTER JOIN query when you specify the :include option. Note they appear first since their author name values are both null, which is sorted before any of the other authors. If you read the ActiveRecord log output, you’ll see it generated a very complex SQL statement that explicitly mentions each column of both the books and authors tables. It used a column naming pattern, “t0_r1” etc., to identify each column. The reason for all of this is that ActiveRecord is executing the “Load Including Associations” operation, which is loading each and every attribute for all of the associated objects. This guarantees that we get every possible value for all the books and their authors.

So this is perfect! Now I can write a named_scope to sort books by their author name, including null authors, like this:

class Book < ActiveRecord::Base
  has_one :author
  named_scope :sorted_by_author_with_nulls,
    { :include => :author, :order => 'authors.name' }
end

And trying it in the console:

>> Book.sorted_by_author_with_nulls.collect { |book| book.title }
  Book Load Including Associations (3.9ms)   SELECT "books"."id" AS t0_r0,
"books"."title" AS t0_r1, "books"."created_at" AS t0_r2,
"books"."updated_at" AS t0_r3, "authors"."id" AS t1_r0,
"authors"."name" AS t1_r1, "authors"."book_id" AS t1_r2,
"authors"."created_at" AS t1_r3, "authors"."updated_at" AS t1_r4 FROM "books"
LEFT OUTER JOIN "authors" ON authors.book_id = books.id ORDER BY authors.name
=> ["Four", "Five", "One", "Three", "Two"]

I can now also write this named_scope called “missing_author,” which returns just the books that have a missing author:

class Book < ActiveRecord::Base
  has_one :author
  named_scope :sorted_by_author_with_nulls,
    { :include => :author, :order => 'authors.name' }
  named_scope :missing_author,
    { :include => :author, :conditions => 'authors.name IS NULL' }
end

And in the console:

>> Book.missing_author.collect { |book| book.title }
  Book Load Including Associations (1.9ms)   SELECT "books"."id" AS t0_r0,
"books"."title" AS t0_r1, "books"."created_at" AS t0_r2,
"books"."updated_at" AS t0_r3, "authors"."id" AS t1_r0,
"authors"."name" AS t1_r1, "authors"."book_id" AS t1_r2,
"authors"."created_at" AS t1_r3, "authors"."updated_at" AS t1_r4 FROM "books"
LEFT OUTER JOIN "authors" ON authors.book_id = books.id
WHERE (authors.name IS NULL)
=> ["Four", "Five"]

What’s wrong with :include?

It sounds like I’m done and that the :include option is the perfect way to handle associations that might contain null or missing values. But not so fast… it turns out that using :include is often a bad idea. A great resource on :joins and :include options is Ryan Bates’s screen cast from 2009. If you think you need to use one of these find options, invest a few minutes and listen to Ryan’s explanation. Here I’ll just mention a couple of potential problems with using :include with named scopes:

1. It‘s potentially slow and wasteful. :include causes ActiveRecord to load every attribute for every included associated object, which is often much more information than you really need. If your named scope only requires one or two columns from the associated table, then using :include might be overkill.

2. As Ryan mentions, you lose control over the “SELECT” portion of the query. This means that if you write a named scope that uses :include, like the missing_author example above, then you can’t chain it together with other named scopes that contain a select option. For example, you couldn’t write code like this:

Book.missing_author.scoped(:select => 'DISTINCT title')

This might appear to work, but if you look at the query ActiveRecord generates you’ll notice that it doesn’t contain the DISTINCT keyword. This is because the Load Including Associations query ignores the select scope.

A better way to write a named_scope that uses LEFT OUTER JOIN is actually to go back to the :joins option, like this:

class Book < ActiveRecord::Base
  has_one :author
  named_scope :sorted_by_author_with_nulls, {
    :joins => 'LEFT OUTER JOIN authors ON authors.book_id = books.id',
    :order => 'authors.name'
  }
  named_scope :missing_author, {
    :joins => 'LEFT OUTER JOIN authors ON authors.book_id = books.id',
    :conditions => 'authors.id IS NULL'
  }
end

Here I’ve written the join clause of the query for each scope, typing in the LEFT OUTER JOIN syntax explicitly. Here’s the example from above using DISTINCT:

>> Book.missing_author.scoped(:select => 'DISTINCT title')
   Book Load (1.1ms)   SELECT DISTINCT title FROM "books"
                       LEFT OUTER JOIN authors ON authors.book_id = books.id
                       WHERE (authors.id IS NULL) 
=> [#<Book title: "Four">, #<Book title: "Five">]

It works now since the :joins option simply adds LEFT OUTER JOIN to the query without rewriting the entire SELECT statement. Then ActiveRecord combines the join with the select scope, inserting the DISTINCT keyword into the query.

Customizing SearchLogic to use LEFT OUTER JOIN

Ok – now using the same techniques from my last post, let’s see if we can customize SearchLogic to support this sort of named scope. First let’s install SearchLogic into our sample app:

$ script/plugin install http://github.com/binarylogic/searchlogic.git

And next, let’s implement a new named scoped called “without_author” that will work the same way as the “missing_author” scope from above. We’ll use method_missing the same way that SearchLogic does; see my last article for a more complete explanation. Here we go:

class Book < ActiveRecord::Base
  has_one :author
  class << self
    def method_missing(name, *args, &block)
      if name == :without_author
        named_scope :without_author,
          {
            :joins => 'LEFT OUTER JOIN authors
                       ON authors.book_id = books.id',
            :conditions => 'authors.id IS NULL'
          }
          without_author
      else
        super
      end
    end
  end
end

And let’s try it in the console:

>> Book.without_author.collect { |book| book.title }
   Book Load (1.3ms)   SELECT "books".* FROM "books" LEFT OUTER JOIN authors
   ON authors.book_id = books.id WHERE (authors.id IS NULL) 
=> ["Four", "Five"]

Works perfectly! All we need to do now is generalize this for any model and association. Following the pattern from the SearchLogic source code, to get a list of associated models I can call “reflect_on_all_associations.” This will return a list of AssociationReflection objects, each of which represents an association between this model (Book) and some other model (Author). See the ActiveRecord source code for more details.

Here’s what the code looks like with the call to reflect_on_all_associations:

class Book < ActiveRecord::Base
  has_one :author
  class << self
    def method_missing(name, *args, &block)
      association_names =
        reflect_on_all_associations.collect { |assoc| assoc.name }
      if name.to_s =~ /^without_(#{association_names.join("|")})$/
        named_scope :without_author,
          {
            :joins => 'LEFT OUTER JOIN authors
                       ON authors.book_id = books.id',
            :conditions => 'authors.id IS NULL'
          }
          without_author
      else
        super
      end
    end
  end
end

You can see that we construct a regex pattern from a list of associated model names, joined with the | character. For example, if there were 3 associated models, then we would use /without_(assoc1|assoc2|assoc3)/… in this example since Book has only one associated model, we have a simple regex pattern: /without_(author)/.

Let’s make sure the code still works in the console:

>> Book.without_author.collect { |book| book.title }
   Book Load (1.0ms)   SELECT "books".* FROM "books" LEFT OUTER JOIN authors
   ON authors.book_id = books.id WHERE (authors.id IS NULL) 
=> ["Four", "Five"]

So far so good. The next thing we need to do is replace the hard coded values in the call to named_scope. To do this, we’ll need the AssociationReflection object that corresponds to the matching association name. Here’s some code that does that:

def matching_association(match)
  @matching_association ||= reflect_on_all_associations.detect do |assoc|
    assoc.name.to_s == match
  end
end

If we pass in the matching name from the regex pattern above, e.g. matching_association($1), then we’ll get the corresponding association object. Once we have that, we can get the name of that association’s database table and primary key:

associated_table = matching_association($1).table_name
associated_key = matching_association($1).primary_key_name

One subtle point to note here: ActiveRecord's AssociationReflection.primary_key_name method actually returns the foreign key column for this association, the book_id column in the authors table, and not the primary key of the authors table, which would have just been id. It works properly, but possibly should have been named foreign_key_name. Anyway, now we’ll need these values to construct our LEFT OUTER JOIN and condition strings. These methods do that:

def join_clause(associated_table, associated_key)
  outer_join = "LEFT OUTER JOIN #{associated_table}"
  outer_join += " ON #{associated_table}.#{associated_key}"
  outer_join += " = #{quoted_table_name}.#{primary_key}"
end

def where_clause(associated_table, associated_key) "#{associated_table}.#{associated_key} IS NULL" end


Finally, we can put it all together like this:

class Book < ActiveRecord::Base
  has_one :author

  class << self
    def method_missing(name, *args, &block)
      association_names =
        reflect_on_all_associations.collect { |assoc| assoc.name }
      if name.to_s =~ /^without_(#{association_names.join("|")})$/
        associated_table = matching_association($1).table_name
        associated_key = matching_association($1).primary_key_name
        named_scope name,
          {
            :joins => join_clause(associated_table, associated_key),
            :conditions => where_clause(associated_table, associated_key)
          }
        send(name)
      else
        super
      end
    end

    def matching_association(match)
      @matching_association ||= reflect_on_all_associations.detect do |assoc|
        assoc.name.to_s == match
      end
    end

    def join_clause(associated_table, associated_key)
      outer_join = "LEFT OUTER JOIN #{associated_table}"
      outer_join += " ON #{associated_table}.#{associated_key}"
      outer_join += " = #{quoted_table_name}.#{primary_key}"
    end

    def where_clause(associated_table, associated_key)
      "#{associated_table}.#{associated_key} IS NULL"
    end
  end
end

Note that I used “send(name)” to call the named scope we just created with named_scope. Let’s make sure it still all works properly in the console:

>> Book.without_author.collect { |book| book.title }
   Book Load (1.0ms)   SELECT "books".* FROM "books"
   LEFT OUTER JOIN authors ON authors.book_id = "books".id
   WHERE (authors.book_id IS NULL) 
=> ["Four", "Five"]

Phew – it does! Ideally I would have some RSpec examples setup to test this, instead of using the console.

Just like in my last article, the last thing I’ll do today is move these class methods out of the Book model and into a new module called SearchLogicExtensions, which in my application I saved into a file called config/initializers/search_logic_extensions.rb. This causes the method missing code to be loaded when my app starts up. At the bottom I extend ActiveRecord::Base with the new module, so the named scope can be used by every model in my application:

module SearchLogicExtensions
  def method_missing(name, *args, &block)
    association_names =
      reflect_on_all_associations.collect { |assoc| assoc.name }
    if name.to_s =~ /^without_(#{association_names.join("|")})$/
      associated_table = matching_association($1).table_name
      associated_key = matching_association($1).primary_key_name
      named_scope name,
        {
          :joins => join_clause(associated_table, associated_key),
          :conditions => where_clause(associated_table, associated_key)
        }
      send(name)
    else
      super
    end
  end

  def matching_association(match)
    @matching_association ||= reflect_on_all_associations.detect do |assoc|
        assoc.name.to_s == match
    end
  end

  def join_clause(associated_table, associated_key)
    outer_join = "LEFT OUTER JOIN #{associated_table}"
    outer_join += " ON #{associated_table}.#{associated_key}"
    outer_join += " = #{quoted_table_name}.#{primary_key}"
  end

  def where_clause(associated_table, associated_key)
    "#{associated_table}.#{associated_key} IS NULL"
  end
end

ActiveRecord::Base.extend(SearchLogicExtensions)