Updating Ruby Under a Microscope
Ruby stores much of its own internal data in hash tables.
I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while to finish. Leave a comment or drop me a line and I'll email you when it's finished.
In the meantime, here’s an excerpt from a rewrite of Chapter 7 about hash tables. Vladimir Makarov and the Ruby team redesigned Ruby’s hash table implementation back in 2015 to use open addressing, shortly after I published the first edition of RUM. This seemed like a good place to start.
Chapter 7: The Hash Table: The Workhorse Of Ruby Internals
Hash Tables in Ruby | 3 |
Saving a Value in a Hash Table | 3 |
Retrieving a Value from a Hash Table | 5 |
Experiment 7-1: Retrieving a Value from Hashes of Varying Sizes | 7 |
How Hash Tables Expand to Accommodate More Values | 9 |
Hash Collisions and Open Addressing | 9 |
Searching For an Element Using Open Addressing | 11 |
Expanding a Hash Table | 14 |
How Does Ruby Decide How Many Entries And Bins To Use? | 15 |
Experiment 7-2: Inserting One New Element into Hashes of Varying Sizes | 17 |
Optimization for Small Hashes | 20 |
How Does Ruby Convert A Packed Hash Into A Hash Table? | 22 |
How Ruby Implements Hash Functions | 23 |
Experiment 7-3: Using Objects as Keys in a Hash | 25 |
Summary | 30 |
Hash Tables in Ruby
Hash tables are a commonly used, well-known, age-old concept in computer science. They organize values into groups, or bins, based on an integer value calculated from each value — a hash. When you need to find a value, you can figure out which bin it’s in by recalculating its hash value, thus speeding up the search.
Every time you write a method, Ruby creates an entry in a hash table.
Saving a Value in a Hash Table
Figure 7-1 shows a single hash object and its hash table.
Figure 7-1: A Ruby hash object with an empty hash table
On the left is the RHash
(short for Ruby hash) structure. On the right, you see
the hash table used by this hash, represented by the st_table
structure. This C
structure contains the basic information about the hash table, including the
number of entries saved in the table and pointers to the entries and bins. Each
RHash
structure contains a pointer to a corresponding st_table
structure. For
many hashes, Ruby initially creates 32 entries and 64 bins. (Hashes with 8 or
fewer entries work somewhat differently; see “Optimization for Small Hashes” on
page 20.) The best way to understand how a hash table works is by stepping
through an example. Suppose I add a new key/value to a hash called my_hash
:
my_hash[:key] = "value"
While executing this line of code, Ruby saves the key and value into the first entry, as shown in Figure 7-2.
Figure 7-2: A Ruby hash object containing a single value
Here you can see the new key/value pair inside the first entry slot, called an
st_table_entry
in Ruby’s C source code. Ruby saves the keys and values in the
entries array in the order you add them. This makes it easy for Ruby to return
them back to you in the same order. Also see that Ruby saved an entry index of 0
in the third bin, number 2. Ruby did this by taking the given key — in this
example, the symbol :key
— and passing it to an internal hash function that
returns a pseudorandom integer:
some_value = internal_hash_function(:key)
Next, Ruby takes the hash value — in this example, some_value
— and calculates the modulus by the number of bins, which is the remainder after dividing by the number of bins.
some_value % 64 = 2
Now let’s add a second element to the hash:
my_hash[:key2] = "value2"
This time let’s imagine that the hash value of :key2
divided by 64 yields a
remainder of 5.
internal_hash_function(:key2) % 64 = 5
Figure 7-3 shows that Ruby fills in a second st_table_entry
structure in the
entries array, and the entry index 1 in bin number 5, the sixth bin.
Figure 7-3: A Ruby hash object containing two values
Retrieving a Value from a Hash Table
The benefit of using a hash table becomes clear when you ask Ruby to retrieve the value for a given key. For example:
p my_hash[:key] => "value"
If Ruby had saved all of the keys and values in an array or linked list, it
would have to iterate over all the elements in that array or list, looking for
:key. This might take a very long time, depending on the number of elements. But
using a hash table, Ruby can jump straight to the key it needs to find by
recalculating the hash value for that key. To recalculate the hash value for a
particular key, Ruby simply calls the hash function again: some_value = internal_hash_function(:key)
.
Then, it redivides the hash value by the number of bins to get the remainder, or
the modulus. some_value % 64 = 2
At this point, Ruby knows to look in bin
number 2 and finds the entry index 0, and in turn finds the entry with the key
of :key
in entry number 0 or the first entry slot. Ruby can later find the
value for :key2
by repeating the same hash calculation
internal_hash_function(:key2) % 64 = 5
.
Figure 7-4: Finding Values in a Hash Table
Figure 7-4 explains how Ruby would find these two values in the hash table: On
the left side, the first arrow starts from the third bin (bin index 2 =
internal_hash_function(:key) % 64
) and points to the first key/value pair, at
index 0. To the right, the second arrow starts from the sixth bin (bin index 5 =
internal_hash_function(:key2) % 64
) and points to the second key/value pair, at
index 1.