Paperclip sample app part 3: saving file attachments in a database BLOB column

In part 1 of this series, I showed how to create a simple Rails web site that uses the Paperclip plugin from Thoughtbot to upload and display image files. Then in part 2, I went on to change the sample app to download the image files through a Rails controller and not just through a direct call to Apache. This would be useful if you wanted to implement security for file attachments or for a variety of other reasons.

This time I’d like to show how to modify the same sample application to save the file attachments in a database BLOB column, instead of on your web server’s file system. To jump ahead and just get the working code, look at the “part3” folder in the github repo: http://github.com/patshaughnessy/paperclip-sample-app.

But before we actually work on the sample app, a disclaimer: Don’t try this at home! Serving file content directly from the file system via Apache or some other web server will always be faster and simpler than loading the file attachments from a database table… Apache and other web servers were designed to load and serve files quickly, and there’s normally no need to issue an expensive SQL query or to make another network connection to a database server just to send files to a web browser.

So why in the world would you ever want to pay the extra performance penalty and move the files into a database table? Here are a couple of reasons:

  1. Your client or employer wants you to. Some companies insist on using a commercial, “enterprise” RDMS system to save file contents for one reason or another. My employer, for example, has many years of experience using Oracle and is comfortable managing large numbers of files that are stored in an Oracle table, while the thought of managing, replicating, backing up, etc., files that are simply saved on a Linux file system seems much more complex and unfamiliar.
  2. Security. If you need to encrypt the contents of super-secret file attachments, storing them in a database might be an easier solution if you’re willing to spend money on a database server like Oracle. And database encryption aside, generally information in a database server can be more easily protected and audited than files on your web server's file system can be.

Anyway, let’s move on and actually change the sample app to save the files in a BLOB column. The first thing we will need to do is to use the version of Paperclip that I modified; the actual Paperclip plugin from Thoughtbot does not support storing files in a BLOB column. I added some code to Paperclip – a new “storage module” – to make this possible. See my code changes to learn more.

So let’s delete the original plugin and install my version:

$ cd /path/to/paperclip-sample-app
$ rm -rf vendor/plugins/paperclip
$ ./script/plugin install git://github.com/patshaughnessy/paperclip.git

Now that we have the modified plugin installed, let’s go ahead and create the BLOB columns that we will use to save the files. I tried to design the database storage module to be easy to use; one of the decisions I made was around what these BLOB columns should be called. I decided by default to use “[attachment]_file” as the name for the primary file attachment, and “[attachment]_[style]_file” for the other styles. If you want to use other column names, you just need to specify the names in the call to “has_attached_file” in the model. See my usage post for more info.

For this sample app I’ll go ahead and use the default column names: “avatar_file,” “avatar_small_file” and “avatar_thumb_file.” Here’s how to create those columns for a MySQL database. First create a new migration as usual:

$ ./script/generate migration add_attachments_blob_avatar_to_user
exists  db/migrate
create  db/migrate/20090528173400_add_attachments_blob_avatar_to_user.rb

… and then edit the new migration file and add the bolded code to it:

class AddAttachmentsBlobAvatarToUser < ActiveRecord::Migration
  def self.up
    execute 'ALTER TABLE users ADD COLUMN avatar_file LONGBLOB'
    execute 'ALTER TABLE users ADD COLUMN avatar_small_file LONGBLOB'
    execute 'ALTER TABLE users ADD COLUMN avatar_thumb_file LONGBLOB'
  end

def self.down remove_column :users, :avatar_file remove_column :users, :avatar_small_file remove_column :users, :avatar_thumb_file end end

Normally to create a BLOB column you would use “add_column :users, :avatar_file, :binary.” This would work fine for Oracle and other database servers. MySQL, however, supports four different types of BLOBs: TINYBLOB (256 bytes), BLOB (64k bytes), MEDIUMBLOB (16MB) and LONGBLOB (4GB). (See http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html for more information.) If we used the Rails migrations :binary column type, then we would get normal BLOBs and have an upper limit for the file attachment size of 64k, which is not normally enough. Unfortunately, there’s no way to specify LONGBLOB for MySQL using Rails migrations, so you have to use the “execute” migration and write an actual SQL statement to add the column.

Next go ahead and run your migrations and create the new columns:

$ rake db:migrate
(in /path/to/paperclip-sample-app)
==  AddAttachmentsBlobAvatarToUser: migrating =================================
-- execute("ALTER TABLE users ADD COLUMN avatar_file LONGBLOB")
   -> 0.0585s
-- execute("ALTER TABLE users ADD COLUMN avatar_small_file LONGBLOB")
   -> 0.0266s
-- execute("ALTER TABLE users ADD COLUMN avatar_thumb_file LONGBLOB")
   -> 0.0116s
==  AddAttachmentsBlobAvatarToUser: migrated (0.0973s) ========================

Now we just need to tell Paperclip to use database storage instead of the default file system storage and we’re ready to try our app and see if it saves the files into the new BLOB columns. Add the bolded parameter to has_attached_file in the User model:

class User < ActiveRecord::Base
  has_attached_file :avatar,
                    :storage => :database,
                    :styles => { :thumb => "75x75>", :small => "150x150>" },
                    :url => '/:class/:id/:attachment?style=:style'
end

Note that I also removed the “:path” parameter; this value would be ignored by the database storage module anyway since the files will be stored in the DB. Let’s try it out! Start up the sample app, and re-edit a user record to upload a new image file:

Select a file, click “Update” to submit the form and the file will be processed by ImageMagick, and saved into our new BLOB columns by the database storage module in Paperclip…

What? Where’s the image? It turns out that we still need to make a code change to the UsersController to download the image file from the BLOB column instead of from the file system. I’ll get to this in a moment. But first, let’s look at the console and see if the new files were saved into the database properly:

$ ./script/console 
Loading development environment (Rails 2.3.2)
>> User.first
=> #<User id: 1, name: "Mickey Mouse", email: "mickey@disney.com",
created_at: "2009-05-28 17:27:00", updated_at: "2009-05-28 17:37:42",
avatar_file_name: "mickey-mouse.jpg", avatar_content_type: "image/jpeg",
avatar_file_size: 137233, avatar_updated_at: "2009-05-28 17:37:42",
avatar_file: "\377???JFIF\000\001\002\001\001h\001h\000\000\377?\021\b\002\210\001?\001\021\000\002\021\001\003\021\001\377?\204\000\001\001\001\001\001\001\001...",
avatar_small_file: "\377???JFIF\000\001\001\001\001h\001h\000\000\377?C\000\003\002\002\002\002\002\003\002\002\002\003\003\003\003\004\006\004\004\004\004\004\b\006\006\005\006...",
avatar_thumb_file: "\377???JFIF\000\001\001\001\001h\001h\000\000\377?C\000\003\002\002\002\002\002\003\002\002\002\003\003\003\003\004\006\004\004\004\004\004\b\006\006\005\006...">

Here you can see the three new BLOB columns, avatar_file, avatar_small_file and avatar_medium_file, and the first few bytes of each column’s value. In fact, if you’re a real geek you’ll notice that the first few bytes contain “JFIF,” which is probably the image file type specification (not sure)… so we know we are seeing the binary contents of the three versions of the Mickey image here. Great!

Actually, not so great: there’s a subtle problem here we need to worry about. I actually typed in a very simple ActiveRecord command, “User.first,” and it loaded all three of the image files’ contents into memory just so that I could inspect the value of the user record in the console. This was convenient now, since I was actually interested in knowing whether or not each of the files was saved properly in the DB. However, this is potentially a big performance problem in general. Imagine if the files were very large… it could take a long time for all 3 of the files to be fetched by a SQL query and returned to the Rails ActiveRecord object. Do I really want or need this to happen every time I access a User record? Usually when I load a User record it’s just because I need to check the value of one of the metadata columns, like the user’s name, email address or something else. And if I do need one of the files, why should I have to load all three files? Spending the time required to load the file contents for each image style is a big performance penalty that isn’t usually necessary.

The solution I came up for this problem was to enable Paperclip to add a method to your model class called “select_without_file_columns_for” that you can use as a named scope or, even better, a default scope. It returns a :select scope hash that will exclude the BLOB columns from the SQL query that ActiveRecord issues to load each record. If you’re using Rails 2.3 or higher, you can use select_without_file_columns_for as a default scope in your model like this:

class User < ActiveRecord::Base
  has_attached_file :avatar, :storage => :database,
                    :styles => { :thumb => "75x75>", :small => "150x150>" },
                    :url => '/:class/:id/:attachment?style=:style'
  default_scope select_without_file_columns_for(:avatar)
end

To learn more about default scopes and to see a couple of other examples, read this: http://ryandaigle.com/articles/2008/11/18/what-s-new-in-edge-rails-default-scoping … or http://m.onkey.org/2009/3/24/default-scopes-and-inheritance-to-the-rescue. Let’s see how this works in the console again after adding the default scope:

$ ./script/console 
Loading development environment (Rails 2.3.2)
>> User.first
=> #<User id: 1, name: "Mickey Mouse", email: "mickey@disney.com",
created_at: "2009-05-28 17:27:00", updated_at: "2009-05-28 17:37:42",
avatar_file_name: "mickey-mouse.jpg", avatar_content_type: "image/jpeg",
avatar_file_size: 137233, avatar_updated_at: "2009-05-28 17:37:42">

Cool. Now the BLOB columns are not displayed. “Default scope” refers to the fact that the SQL used by ActiveRecord by default to load records is automatically changed. I didn’t have to pay the price of loading each of the files, and I was still able to load all of the other User columns using ActiveRecord the way I normally would. To see what happened, execute the “select_without_file_columns_for” method directly in the console:

>> User.select_without_file_columns_for :avatar
=> {:select=>"id,name,email,created_at,updated_at,avatar_file_name,
  avatar_content_type,avatar_file_size,avatar_updated_at"}

It‘s a lot more common to use a default or named scope with :conditions (to modify the WHERE clause) or :order (to modify the ORDER BY clause) but in this case I’ve used :select to specify which columns should be loaded by ActiveRecord (the SELECT portion of the SQL). If you look at the hash, you’ll see all of the User columns listed, except for avatar_file, avatar_small_file and avatar_thumb_file.

If you’re using Rails 2.2 or earlier default scope is not available yet, and you will need to use a named scope, like this:

class User < ActiveRecord::Base
  has_attached_file :avatar,
                    :storage => :database,
                    :styles => { :thumb => "75x75>", :small => "150x150>" },
                    :url => '/:class/:id/:attachment?style=:style'
  named_scope :without_file_data, select_without_file_columns_for(:avatar)
end

And you will need to use the named scope explicitly to avoid loading the files, like this:

$ ./script/console 
Loading development environment (Rails 2.3.2)
>> User.without_file_data.first
=> #<User id: 1, name: "Mickey Mouse", email: "mickey@disney.com",
created_at: "2009-05-30 12:14:03", updated_at: "2009-05-30 12:14:03",
avatar_file_name: "mickey-mouse.jpg", avatar_content_type: "image/jpeg",
avatar_file_size: 137233, avatar_updated_at: "2009-05-30 12:14:03">

A good way to understand exactly what ActiveRecord is doing is to use a Ruby trick and open up the User class and add some debug code to it. Try entering this code into your console:

>> class User < ActiveRecord::Base
>>   def self.find_by_sql(sql)
>>     puts "DEBUG: #{sql}"
>>     super
>>   end
>> end
=> nil

What this code does is open up our User model class, and override the “find_by_sql” ActiveRecord method to display the SQL statement before calling the original base class method to execute it. Find_by_sql is the method that the various different ActiveRecord find methods all call once they have constructed a SQL statement. For example, find :first, find :all, first, last… all of these eventually call find_by_sql. Assuming we have the version of User with the named_scope, we can see what SQL statements are issued with or without the select_without_file_columns_for scope:

>> User.first
DEBUG: SELECT * FROM `users`  LIMIT 1
=> #<User id: 1, name: "Mickey Mouse", email: "mickey@disney.com",
created_at: "2009-05-28 17:27:00", updated_at: "2009-05-28 17:37:42",
avatar_file_name: "mickey-mouse.jpg", avatar_content_type: "image/jpeg",
avatar_file_size: 137233, avatar_updated_at: "2009-05-28 17:37:42",
avatar_file: "\377???JFIF\000\001\002\001\001h\001h\000\000\377?\021\b\002\210\001?\001\021\000\002\021\001\003\021\001\377?\204\000\001\001\001\001\001\001\001...",
avatar_small_file: "\377???JFIF\000\001\001\001\001h\001h\000\000\377?C\000\003\002\002\002\002\002\003\002\002\002\003\003\003\003\004\006\004\004\004\004\004\b\006\006\005\006...",
avatar_thumb_file: "\377???JFIF\000\001\001\001\001h\001h\000\000\377?C\000\003\002\002\002\002\002\003\002\002\002\003\003\003\003\004\006\004\004\004\004\004\b\006\006\005\006...">
>> User.without_file_data.first
DEBUG: SELECT id,name,email,created_at,updated_at,avatar_file_name,
  avatar_content_type, avatar_file_size,avatar_updated_at FROM `users` LIMIT 1
=> #<User id: 1, name: "Mickey Mouse", email: "mickey@disney.com",
created_at: "2009-05-28 17:27:00", updated_at: "2009-05-28 17:37:42",
avatar_file_name: "mickey-mouse.jpg", avatar_content_type: "image/jpeg",
avatar_file_size: 137233, avatar_updated_at: "2009-05-28 17:37:42">

Here we can see that ActiveRecord will load all of the User columns that are listed above in the hash we pass to named_scope… all of the columns except for the BLOB columns. This is different from what ActiveRecord does by default, which is a simple SELECT * FROM …. statement.

Enough about ActiveRecord internals… let’s get back to the sample app and finish it up:

So why didn’t the image appear here properly? The reason is that in UsersController I’m still using the code that accesses the file on the file system and streams it to the client using send_file (see my last post for more info):

def avatars
    user = User.find(params[:id])
    style = params[:style] ? params[:style] : 'original'
    send_file user.avatar.path(style),
              :type => user.avatar_content_type
  end

Obviously send_file is no longer going to work for us. Instead we need to use a similar function in ActionController::Streaming called send_data, which takes the binary data directly as a parameter instead of a file. And to access the file’s contents, I’ve added a method called “file_contents” to Paperclip that returns the actual file contents for the given style, or for the original style by default. Here’s how to put it all together; replace the “avatars” method in UsersController with this new version instead:

def avatars
    user = User.find(params[:id])
    style = params[:style] ? params[:style] : 'original'
    send_data user.avatar.file_contents(style),
              :type => user.avatar_content_type
  end

Only the line in bold has changed. We just call user.avatar.file_contents, pass in the specified style and then pass along the data to send_data. If you restart the app and refresh your browser, now you should see the image again:

Now we are seeing the binary image file data loaded from the BLOB column by ActiveRecord and streamed down to the browser by send_data.

One last detail about this: since usually everyone will use the same controller code to load the file contents for a given style from the BLOB, and then pass it along to send_data, I enabled Paperclip to add another utility method, this time to your controller, to make this even easier:

class UsersController < ApplicationController
  downloads_files_for :user, :avatar
etc…

If you call “downloads_files_for” from your controller like this and specify the model and file attachment, it will automatically generate the correct controller method for you, and call it “avatars” or whatever the plural version of your attachment name is. I chose the name to make it easy and natural to create a route to it in routes.rb. No need to even think about send_data or how to get the file contents from Paperclip at all! Nothing could be simpler. However, if you need to implement security or some other business rules around downloading files, then you might need to add that business logic to the “avatars” method above. Either way, it’s very simple.