Inserting One New Element into Hashes of Varying Sizes

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished.

RUM includes a series of “experiments:” simple code snippets that show evidence the book’s explanations are accurate. One of the first experiments I wrote back in 2013, Experiment 7-2 is a fun way to see exactly when Ruby increases the number of bins in a hash table. The experiments in RUM are a great way to see for yourself how Ruby works. They also keep me honest; in fact, I ran this code again recently using Ruby 3.4.1 and saw different results than what I expected!

Chapter 7: The Hash Table: The Workhorse Of Ruby Internals

Hash Tables in Ruby3
Saving a Value in a Hash Table3
Retrieving a Value from a Hash Table5
Experiment 7-1: Retrieving a Value from Hashes of Varying Sizes7
How Hash Tables Expand to Accommodate More Values9
Hash Collisions and Open Addressing9
Searching For an Element Using Open Addressing11
Expanding a Hash Table14
How Does Ruby Decide How Many Entries And Bins To Use?15
Experiment 7-2: Inserting One New Element into Hashes of Varying Sizes17
Optimization for Small Hashes20
How Does Ruby Convert A Packed Hash Into A Hash Table?22
How Ruby Implements Hash Functions23
Experiment 7-3: Using Objects as Keys in a Hash25
Summary30

Experiment 7-2: Inserting One New Element into Hashes of Varying Sizes

One way to test whether this rehashing, or redistribution, of entries really occurs when Ruby expands a hash is to measure the amount of time Ruby takes to save one new element into existing hashes of different sizes. As we add more elements to the same hash, we should eventually see evidence that Ruby is taking extra time to rehash the elements.

The code for this experiment is shown in Listing 7-3.

require 'benchmark'

(1) 100.times do |size|

    hashes = []
(2) 10000.times do
      hash = {}
      (1..size).each do
        hash[rand] = rand
      end 
      hashes << hash
    end 

    GC.disable

    Benchmark.bm do |bench|
      bench.report("adding element number #{size+1}") do
        10000.times do |n| 
(3)       hashes[n][size] = rand
        end 
      end 
    end 

    GC.enable

  end
Listing 7-3: Adding one more element to hashes of different sizes

At (1) the outer loop iterates over hash sizes from 0 to 100, and at (2) the inner loop creates 10,000 hashes of the given size. After disabling garbage collection, this experiment uses the benchmark library to measure how long it takes Ruby to insert a single new value at (3) into all 10,000 hashes of the given size. The results are surprising! Figure 7-13 shows the results for Ruby 3.4.1.


Figure 7-13: Time to add 10,000 key/value pairs vs. hash size (Ruby 3.4.1)

Interpreting these data values from left to right, we see the following:

  • It takes about 0.6 ms to insert the first element into an empty hash (10,000 times).

  • As the hash size increases from 2 to 8, the amount of time required to insert a new element is about the same: 0.6ms more or less.

  • Inserting the 9th key/value pair takes much longer, about 2ms for 10,000 hashes!

  • Next, as the hash size increases from 10 up to 16, once again Ruby can insert new elements quickly, between 0.6ms and 0.7ms (10,000 times).

  • A huge spike! It takes almost 3.1ms to insert the 17th element.

  • And then once again starting with the 18th element, the time to insert each element reduced to around 0.7ms-0.8.ms.

  • A 3rd spike appears when Ruby inserts the 33rd element: almost 5ms.

  • The graph once again flattens and returns to around 0.7-0.8ms per element (10,000 times).

  • And a 4th spike appears when Ruby inserts the 65th element: almost 6ms.

What’s going on here? Well, Ruby spends the extra time required to insert that 17th key/value pair expanding the hash table: reallocating the entries array from 16 to 32 entries, and the bin array from 32 to 64 bins, and then reassigning the st_table_entry structures to the new bin array. Ruby expands the entries and bins arrays a second time again after inserting the 33rd entry, this from from 32 to 64 entries and 64 to 128 bins. (Recall the st_features table, shown on page 15, used powers of 2 to determine these array sizes.)

The smaller spike on the 9th insert in this figure is curious. While not as pronounced as the spike at the 17th element, this smaller spike is nonetheless noticeable. As it turns out, Ruby contains another optimization that speeds up hash access even more for small hashes that contain less than 9 elements.