ActiveRecord with large result sets - part 1: select_all vs. find

It’s well documented that using ActiveRecord::Base.connection.select_all can speed up ActiveRecord database queries when you are expecting a large result set. This is because select_all just returns an array of hashes containing the results, and ActiveRecord doesn’t have to do the work of instantiating a new model object for each row in the result.

Here’s an example:

ActiveRecord::Base.connection.select_all('SELECT * FROM users')

This returns an array of hashes, instead of user model objects:

=> [{"name"=>"pat",
     "created_at"=>"2010-09-03 16:59:24.097209",
     "updated_at"=>"2010-09-03 16:59:24.097209",
     "id"=>1,
     "email"=>"pat@patshaughnessy.net"}, etc...]

You can also use “select_values” if you need just one value from a single database column:

ActiveRecord::Base.connection.select_values('SELECT name FROM users')

=> ["pat", etc... ]


If select_all is faster than using a normal call to ActiveRecord::Base.find, why not use select_all for everything? The reason is that the extra speed select_all offers comes with a heavy price:

  • You have to write the SQL statement yourself. Normally ActiveRecord generates the SQL statement for you automatically based on your associations, named scopes, etc.
  • You have to use a hash to access the result, instead of a model object. In this example, you would have to write user[“name”] instead of user.name. Aside from being harder to read and type, with a hash you also lose the opportunity to call model methods to process the value somehow.

So using select_all is a tradeoff: more speed vs. less coding convenience. Is it worth it?

Well, it might be if:

  • You expect a large number of results from a database query - this will make the speed benefit more noticeable.
  • It’s easy to write the SQL select statement, because you don’t have many different SQL statements to write based on different combinations of associations and scopes.
  • Or if you need to use an unusual, hand crafted SQL statement that ActiveRecord wouldn’t normally generate you might have to use select_all regardless of the speed benefit.

This week I ran some Rails performance tests to measure the actual speed difference for a very simple find(:all) query with no associations. What I found was:

  • For Rails 3, select_all was almost 2x times faster: it took ActiveRecord about 40% less time to return an array of hashes with select_all than it did to return an array of model objects.
  • For Rails 2.3.8, there was a less noticeable difference: only a 22% improvement.
  • Using Ruby 1.8.7 vs. Ruby 1.8.6 was much faster overall: about 2x faster for both select_all and find.

This rest of this article will show you how I setup and ran the performance tests, and what my detailed results were. Most of what I’m going to do below is taken form an excellent Rails guide article by Pratik Naik: Performance Testing Rails Applications.

Setting up for a Rails performance test

I’ll get started by creating a new Rails 3 app - using the 3.0.0 version which was released just last week!

$ rails new perf_test
      create
      create  README
      create  Rakefile
      create  config.ru
      create  .gitignore
      create  Gemfile
etc....

And now I’ll create a user model from the example above:

$ cd perf_test
$ rails generate model user name:string email:string
      invoke  active_record
      create    db/migrate/20100903165238_create_users.rb
      create    app/models/user.rb
      invoke    test_unit
      create      test/unit/user_test.rb
      create      test/fixtures/users.yml

$ rake db:migrate (in /Users/pat/rails-apps/perf_test) == CreateUsers: migrating ============================== -- create_table(:users) -> 0.0013s == CreateUsers: migrated (0.0014s) =====================


Next I’ll write a simple rake task in a file called lib/tasks/users.rake to create a specified number of users, so we can time queries based on different numbers of records. Also, having a separate rake task for this avoids the possibility of including the record create time in the performance tests.

task :populate_users => :environment do
  count = ENV['count'].to_i
  User.delete_all
  count.times do
    User.create :name  => 'pat',
                :email => 'pat@patshaughnessy.net'
  end
  puts "User count: #{User.count}"
end

Now I’ll create a new performance test using the Rails generator, and delete the test that came with the new app:

$ rails generate performance_test load_users
      invoke  test_unit
      create    test/performance/load_users_test.rb
$ rm test/performance/browsing_test.rb

Editing the load_users_test.rb file, I’ll add a couple simple tests that load all of the user records:

require 'test_helper'
require 'rails/performance_test_help'

class LoadUsersTest < ActionDispatch::PerformanceTest
  def test_find
    user_models = User.find :all
    puts "Loaded #{user_models.size} users"
  end

  def test_select_all
    user_hashes =
      ActiveRecord::Base.connection.select_all('SELECT users.* FROM users')
    puts "Loaded #{user_hashes.size} users"
  end
end

Finally, I’ll install “ruby-prof,” a helpful profiling tool that will allow us to use the rake test:profile command:

$ gem install ruby-prof
Building native extensions.  This could take a while...
Successfully installed ruby-prof-0.9.2
1 gem installed
Installing ri documentation for ruby-prof-0.9.2...
Installing RDoc documentation for ruby-prof-0.9.2...

And I need to add this gem in my Gemfile:

source 'http://rubygems.org'

gem 'rails', '3.0.0'

gem 'ruby-prof'

# Bundle edge Rails instead: # gem 'rails', :git => 'git://github.com/rails/rails.git'


Ok - great! Now I’ll create 10,000 users and then run the new performance tests:

$ rake populate_users count=10000
(in /Users/pat/rails-apps/perf_test)
User count: 10000

$ rake test:profile (in /Users/pat/rails-apps/perf_test) Loaded suite /Users/pat/.rvm/gems/... Started Loaded 2 users LoadUsersTest#test_find (10 ms warmup) Loaded 2 users process_time: 4 ms memory: unsupported objects: unsupported .Loaded 2 users LoadUsersTest#test_select_all (0 ms warmup) Loaded 2 users process_time: 2 ms memory: unsupported objects: unsupported . Finished in 0.319634 seconds.

6 tests, 0 assertions, 0 failures, 0 errors


Huh? What happened? Why were there only two users loaded? Well it turns out that the 2 users came from the test/fixtures/users.yml fixtures file create by the model generator I ran earlier:

one:
  name: MyString
  email: MyString

two: name: MyString email: MyString


And the ten thousand users I created with my rake task were put into my development database... not my test database.

Retaining the contents of the test database between test suite runs

I can easily run my rake task using RAILS_ENV=test to fill the test database with users instead of the development database, but they will still be cleared out when the test database is purged and reloaded each time I run my profile test suite. What I really want to do is retain the contents of my test database each time I run the tests. Using code from a helpful stack overflow discussion on how to do this, I put this function into my Rakefile:

require File.expand_path('../config/application', __FILE__)
require 'rake'
Rake::TaskManager.class_eval do def remove_task(task_name) @tasks.delete(task_name.to_s) end end

PerfTest::Application.load_tasks


...and I created a NOP task in my users.rake file, after calling remove_task:

Rake.application.remove_task 'db:test:prepare'

namespace :db do namespace :test do task :prepare do |t| end end end


Finally I need to be sure to delete the users.yml fixture file, or Rails will still delete and reload all of the users between each individual test:

$ rm test/fixtures/users.yml

Just how much faster is select_all vs. find?

Ok - now I’m all set to run some tests; let’s start with 1000 users records:

$ rake populate_users count=1000 RAILS_ENV=test
(in /Users/pat/rails-apps/perf_test)
User count: 1000

$ rake test:profile (in /Users/pat/rails-apps/perf_test) Loaded suite /Users/pat/.rvm/gems/ruby-1.8.7... Started Loaded 1000 users LoadUsersTest#test_find (54 ms warmup) Loaded 1000 users process_time: 398 ms memory: unsupported objects: unsupported .Loaded 1000 users LoadUsersTest#test_select_all (33 ms warmup) Loaded 1000 users process_time: 241 ms memory: unsupported objects: unsupported . Finished in 0.99688 seconds.

6 tests, 0 assertions, 0 failures, 0 errors


Each performance profile test gives us three results: process time, memory usage and the number of Ruby objects created. However, since I’m not using the “GC Patched” (garbage collection patch) version of Ruby I only get the process time value. In a future blog post I’ll show how to update your Ruby interpreter with the patch that counts the number of objects created, and measures the amount of memory used. “Process time” refers to the amount of time used by the Ruby process, not the actual time elapsed (the “wall time”).

But for now, we can see that the first test, which uses ActiveRecord::Base.find, took 398ms to load 1000 rows from the SQLite database, and to return an array of 1000 user model objects. The second test, using ActiveRecord::Base.connection.select_all, took 241ms to load the same data but return it in the form of an array of hashes.

Let’s increase the number of users to 10,000 and repeat the test:

$ rake populate_users count=10000 RAILS_ENV=test
(in /Users/pat/rails-apps/perf_test)
User count: 10000

$ rake test:profile (in /Users/pat/rails-apps/perf_test) Loaded suite /Users/pat/.rvm/gems/ruby-1.8.7... Started Loaded 10000 users LoadUsersTest#test_find (476 ms warmup) Loaded 10000 users process_time: 3.99 sec memory: unsupported objects: unsupported .Loaded 10000 users LoadUsersTest#test_select_all (378 ms warmup) Loaded 10000 users process_time: 2.42 sec memory: unsupported objects: unsupported . Finished in 7.730976 seconds.

6 tests, 0 assertions, 0 failures, 0 errors


It looks like the time taken for 10,000 users is simply 10x the amount taken for 1,000. In other words, this is a fairly linear process: it takes my laptop a certain number of milliseconds to process each user row: about 0.4 ms per row for ActiveRecord.find and 0.24ms for select_all.

Results

Here are my timings - I’m using a MacBook Pro with a 2.6 GHz Intel Core 2 Duo processor. My database server is SQLite 3.7.0, via the sqlite3-ruby-1.3.1 gem. FYI the SQLite gem version seems to be important; using an older version of this gem slowed down the results dramatically.

Rails 3.0.0/Ruby 1.8.7:

records select_all find delta
1000 241ms 398ms 39%
10000 2,420ms 3,990ms 39%
100000 25,480ms 42,580ms 40%

Rails 2.3.8/Ruby 1.8.7:

records select_all find delta
1000 262ms 336ms 22%
10000 2,660ms 3,390ms 22%
100000 27,770ms 35,490ms 22%

Rails 2.3.5/Ruby 1.8.6:

records select_all find delta
1000 530ms 651ms 19%
10000 5,350ms 6,540ms 18%
100000 53,820ms 67,030ms 20%