Find Your Language’s Primitives

November 29th 2021 — Comments and Reactions

If you dig into your programming language's syntax, you might
discover that it is capable of much more than you thought it was.

Wikipedia defines “Language Primitive” this way:

In computing, language primitives are the simplest elements available in a programming language. A primitive is the smallest 'unit of processing' available to a programmer of a given machine, or can be an atomic element of an expression in a language.

By looking at a language’s primitives, we can learn what kind of code will be easy to write or impossible to express, and what types of problems the language was intended to solve. Whether you’ve been using a language for years, or just now learning a new language for fun, take the time to find and learn about your language’s primitives. You might discover something you never knew, and will come away with a deeper understanding of how your programs work.

As an example today, I’m going to look at how arrays work in three languages: Ruby, Crystal and x86 Assembly Language.

Retrieving an Array Element In Ruby

In Ruby I can create an array and later access an element like this:

arr = [12345, 67890]
puts arr[1]

This code would be the same or almost the same in many other programming languages. It just means: “find the second element of the array and print it to stdout.”

But how does this actually work? In Ruby, the Array class and all of its methods are language primitives. This means array methods like [] or []= cannot be broken down into smaller pieces of Ruby code. As Wikipedia says, these methods are the smallest unit of processing available to Ruby programmers working with arrays.

Ruby hides the details of how arrays actually work from us. To learn how Ruby actually saves and retrieves values from an array, we would need to switch languages and drop down a level of abstraction, and read the C implementation in the Ruby source code: array.c. There’s nothing wrong with this, of course. Ruby developers use arrays every day without any trouble. But switching from Ruby to C makes understanding internal details much more difficult.

Retrieving an Array Element In Crystal

This Fall I decided to learn more about Crystal, a statically typed language with syntax that resembles Ruby. I expected to find a similar Array#[] primitive. But surprisingly, I was wrong!

The same code from above also works in Crystal:

arr = [12345, 67890]
puts arr[1]

In Crystal, arrays are not language primitives because the Crystal standard library implements arrays using Crystal itself. The Array#[] method is not the smallest unit of processing available to Crystal programmers. Let’s dig into the details and divide up the [] method into smaller and smaller pieces to see how the Crystal team implemented it.

Reading src/indexable.cr in the Crystal standard library, here’s the implementation of Indexable#[] which the array class uses when I call arr[1] above:

# Returns the element at the given *index*.
#
# Negative indices can be used to start counting from the end of the array.
# Raises `IndexError` if trying to access an element outside the array's range.
#
# ```
# ary = ['a', 'b', 'c']
# ary[0]  # => 'a'
# ary[2]  # => 'c'
# ary[-1] # => 'c'
# ary[-2] # => 'b'
#
# ary[3]  # raises IndexError
# ary[-4] # raises IndexError
# ```
@[AlwaysInline]
def [](index : Int)
  fetch(index) { raise IndexError.new }
end

The Crystal team implemented [] using another method called fetch:

# Returns the element at the given *index*, if in bounds,
# otherwise executes the given block with the index and returns its value.
#
# ```
# a = [:foo, :bar]
# a.fetch(0) { :default_value }    # => :foo
# a.fetch(2) { :default_value }    # => :default_value
# a.fetch(2) { |index| index * 3 } # => 6
# ```
def fetch(index : Int)
  index = check_index_out_of_bounds(index) do
    return yield index
  end
  unsafe_fetch(index)
end

Neither the [] operator nor the fetch method are language primitives. To find a language primitive, I need to keep dividing the code up into smaller and smaller pieces, until it can’t be divided any further. The same process a chemist would use to break up some material into smaller and smaller molecules until they are left with a set of atoms.

Let’s continue by reading unsafe_fetch:

# Returns the element at the given *index*, without doing any bounds check.
#
# `Indexable` makes sure to invoke this method with *index* in `0...size`,
# so converting negative indices to positive ones is not needed here.
#
# Clients never invoke this method directly. Instead, they access
# elements with `#[](index)` and `#[]?(index)`.
#
# This method should only be directly invoked if you are absolutely
# sure the index is in bounds, to avoid a bounds check for a small boost
# of performance.
abstract def unsafe_fetch(index : Int)

Since Indexable#unsafe_fetch is an abstract method, I need to read how the Array class implements it back in src/array.cr:

@[AlwaysInline]
def unsafe_fetch(index : Int) : T
  @buffer[index]
end

(source: Nick Bonzey via Wikimedia Commons)

So far, I’ve drilled down through 3 levels of Crystal implementation:

But I haven’t found a primitive function yet. Let’s keep digging!

The Crystal Array Class

To learn more, I need to scroll up and read the beginning of the Crystal Array class definition:

# An `Array` is an ordered, integer-indexed collection of objects of type T.
#
# Array indexing starts at 0. A negative index is assumed to be
# relative to the end of the array: -1 indicates the last element,
# -2 is the next to last element, and so on.

etc...

# An `Array` is implemented using an internal buffer of some capacity
# and is reallocated when elements are pushed to it when more capacity
# is needed. This is normally known as a [dynamic array](http://en.wikipedia.org/wiki/Dynamic_array).
#
class Array(T)

etc...

# The buffer where elements start.
  @buffer : Pointer(T)

I’ve deleted some of the comments and code for clarity. You can read the full, original version in src/array.cr.

Now I get a sense of how the unsafe_fetch method above works. Let’s repeat that again:

def unsafe_fetch(index : Int) : T
  @buffer[index]
end

Crystal saves all of the elements in each array into a memory buffer called @buffer. And when I access an element of the array like this:

puts arr[1]

Crystal first checks that the array index (1 in this example) is valid, and then loads the array element I want from the buffer using: @buffer[1].

The Crystal Pointer Class

But how does @buffer[index] actually work? I haven’t learned anything yet! I’m just going around in circles. So far all I’ve been able to find is that Crystal implements Array#[] with a different [] operator, on a different class. What type of object is @buffer? What does it do?

Reading the array class declaration again more carefully, I can see that @buffer is an instance of the Pointer class:

# The buffer where elements start.
@buffer : Pointer(T)

Let’s read how Crystal implements Pointer#[] in src/pointer.cr:

# Gets the value pointed at this pointer's address plus `offset * sizeof(T)`.
#
# ```
# ptr = Pointer.malloc(4) { |i| i + 10 }
# ptr[0] # => 10
# ptr[1] # => 11
# ptr[2] # => 12
# ptr[3] # => 13
# ```
def [](offset)
  (self + offset).value
end

Reading this, I discovered the Crystal team has written a class that represents pointers! Just like using a pointer in C, Crystal code can refer to and access any memory location directly. Because the Pointer class is part of the language, Crystal allows us to implement our own data structures and algorithms in a very detailed manner, allocating and accessing memory just like the Crystal team has while implementing arrays, hashes and other classes.

Now I’ve dug down through 4 levels of Crystal function calls, but I still haven’t found a language primitive yet.

A Crystal Language Primitive

We still haven’t discovered how arrays actually work - how pointers actually work. That is, reading this line of code above:

(self + offset).value

…I understand the meaning and intent of pointer arithmetic, and how it’s used by Crystal arrays, but I still don’t see how or where Crystal actually obtains the array element referenced by a given pointer.

Digging deeper, let’s read the Pointer#value method to find out - this method’s implementation should tell me exactly how Crystal obtains the value:

# Gets the value pointed by this pointer.
#
# ```
# ptr = Pointer(Int32).malloc(4)
# ptr.value = 42
# ptr.value # => 42
# ```
@[Primitive(:pointer_get)]
def value : T
end

As you can see, this is a language primitive - it says “primitive” right there above the method definition!

Returning to Wikipedia’s definition of a language primitive, this is an atomic element of an expression. The Crystal compiler knows not to try to compile this code but to assume this behavior is part of the language. In fact, there is no implementation for Pointer#value here at all: the method is empty!

def value : T
end

The empty value method above doesn’t tell us where the value actually comes from, or how Crystal obtains it. To learn that, we need to step down one level of abstraction - we need to use a lower level language, not Crystal.

Retrieving an Array Element In x86 Assembly Language

What lower level language should we use? Since the Crystal team used the Low Level Virtual Machine (LLVM) project to implement their compiler, I could look at LLVM’s low level instruction language. But since I’m not familiar with that, or with how the Crystal compiler works, I decided to jump down to the lowest level of abstraction available to me on my Intel Mac: x86 Assembly Language.

Here’s my Crystal program again:

$ cat array_example.cr
arr = [12345, 67890]
puts arr[1]

If I compile the program without running it, and then use the llvm-objdump command, LLVM will give me a version of my code converted into Intel x86 Assembly Language:

$ crystal build array_example.cr
$ llvm-objdump -D array_example > array_example.a

Now by reading the assembly produced by the Crystal compiler and LLVM, I can see how the Pointer#[] and Pointer#value methods actually work:

0000000100089bb0 <_*Pointer(Int32)@Pointer(T)#[]:Int32>:
100089bb0: 50                           pushq %rax
100089bb1: e8 0a 00 00 00               callq 0x100089bc0 <_*Pointer(Int32)@Pointer(T)#+:Pointer(Int32)>
100089bb6: 8b 00                        movl  (%rax), %eax
100089bb8: 59                           popq  %rcx
100089bb9: c3                           retq

0000000100089bc0 <_*Pointer(Int32)@Pointer(T)#+:Pointer(Int32)>:
100089bc0: 48 63 c6                     movslq  %esi, %rax
100089bc3: 48 c1 e0 02                  shlq  $2, %rax
100089bc7: 48 01 c7                     addq  %rax, %rdi
100089bca: 48 89 f8                     movq  %rdi, %rax
100089bcd: c3                           retq

Assembly language is just another programming language like any other, but with a different set of primitives. The primitives in this language are hardware instructions that my laptop’s CPU can understand and execute directly:

I won’t pretend to understand all the details here, but if you’re curious about what this code does - how my compiled Crystal program actually retrieves a value from an array - here are a few highlights you can look for:

If you’d like to learn more about x86 assembly language, I wrote an article a few years ago explaining some of the basics: Learning to Read x86 Assembly Language.

Understand the Primitives of the Language You Are Using

Why did I bother with this exercise? To make sure I deeply understand the programming languages I’m using. Ruby hides much of its implementation in C, so I didn’t learn much looking at the Ruby array primitives in this example. And the primitive functions of assembly language are by definition the instructions my CPU can execute directly. It’s always fun trying to identify and understand machine level instructions!

But Crystal surprised me - I expected to see a set of primitive array functions like we have in Ruby, but I was wrong. Instead, I learned that Crystal supports pointers, just like C or other low level languages do. I discovered that Crystal, unlike Ruby, might be an appropriate choice for low level systems programming tasks. And I was able to learn all of this, along with the array implementation details, because the Crystal team implemented its standard library in the same, target language: Crystal. All I had to do was make an effort to read some code.

Dive into details and find out what the language primitives are in your favorite programming language. You might be surprised and discover that your language is capable of much more than you thought it was.