Compiling a Call to a Block

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished.

This week's excerpt is from Chapter 2, about Ruby's compiler. Whenever I think about it, I'm always suprised that Ruby has a compiler like C, Java or any other programming language. The only difference is that we don't normally interact with Ruby's compiler directly.

The developers who contributed Ruby's new parser, Prism, also had to rewrite the Ruby compiler because Prism now produces a completely different, redesigned abstract syntax tree (AST). Chapter 2's outline is more or less the same as it was in 2014, but I redrew all of the diagrams and updated much of the text to match the new AST nodes and other changes for Prism.

Chapter 2: Compilation

The Ruby Compiler4
Ruby 3.2 Introduces a Just-In-Time (JIT) Compiler 5
How Ruby Compiles a Simple Script6
Scope AST Nodes7
Compiling a Simple AST8
Compiling a Call to a Block12
How Ruby Iterates Through the AST16
Experiment 2-1: Displaying YARV Instructions19
The Local Table21
Compiling Optional Arguments23
Compiling Keyword Arguments24
Unnamed Local Variables25
Experiment 2-2: Displaying the Local Table28
Summary30

Compiling a Call to a Block

Next, let’s compile my 10.times do example from Listing 1-1 in Chapter 1 (see Listing 2-2).

10.times do |n|
  puts n
end
Listing 2-2: A simple script that calls a block (repeated from Listing 1-1)

Notice that this example contains a block parameter to the times method. This is interesting because it will give us a chance to see how the Ruby compiler handles blocks. Figure 2-13 shows the AST for the 10.times do example again.


Figure 2-13: A simplified view of the AST for the call to 10.times, passing a block

The left side of Figure 2-13 shows the AST for the 10.times function call: the call node and the receiver 10, represented by integer node. On the right, Figure 2-13 shows the beginning of the AST for the block: do |n| puts n end, represented by the block node. You can see Ruby has added a scope node on both sides, since there are two lexical scopes in Listing 2-2: the top level and the block. Let’s break down how Ruby compiles the main portion of the script shown on the left of Figure 2-13. As before, Ruby starts with the first PM_NODE_SCOPE and creates a new snippet of YARV instructions, as shown in Figure 2-14.


Figure 2-14: Each PM_SCOPE_NODE is compiled into a new snippet of YARV instructions.

Next, Ruby steps down the AST nodes to PM_CALL_NODE, as shown in Figure 2-15.


Figure 2-15: Ruby stepping through an AST

At this point, there is still no code generated, but notice in Figure 2-13 that two arrows lead from PM_CALL_NODE: one to PM_INTEGER_NODE, which represents the 10 in the 10.times call, and another to the inner block. Ruby will first continue down the AST to the integer node and compile the 10.times method call. The resulting YARV code, following the same receiver-arguments-message pattern we saw in Figures 2-7 through 2-11, is shown in Figure 2-16.


Figure 2-16: Ruby compiles the 10.times method call.

Notice that the new YARV instructions shown in Figure 2-16 push the receiver (the integer object 10) onto the stack first, after which Ruby generates an instruction to execute the times method call. But notice, too, the block in <main> argument in the send instruction. This indicates that the method call also contains a block argument: do |n| puts n end. In this example, the arrow from PM_CALL_NODE to the second PM_SCOPE_NODE has caused the Ruby compiler to include this block argument. Ruby continues by compiling the inner block, beginning with the second PM_CALL_NODE shown at right in Figure 2-13. Figure 2-17 shows what the AST for that inner block looks like.


Figure 2-17: The branch of the AST for the contents of the block

Notice Ruby inserted a scope node at the top of this branch of the AST also. Figure 2-17 shows the scope node contains two values: argc=1 and locals: [n]. These values were empty in the parent scope node, but Ruby set them here to indicate the presence of the block parameter n. From a relatively high level, Figure 2-18 shows how Ruby compiles the inner block.


Figure 2-18: How Ruby compiles a call to a block

You can see the parent PM_NODE_SCOPE at the top, along with the YARV code from Figure 2-16. And below that Figure 2-18 shows the the inner scope node for the block, along with the YARV instructions for the block’s call to puts n. Later in this chapter we’ll learn how Ruby handles parameters and local variables, like n in this example; why Ruby generates these instructions for puts n. The key point for now is that Ruby compiles each distinct scope in your Ruby program—methods, blocks, classes, or modules, for example—into a separate snippet of YARV instructions.