Calling Subroutines

A subroutine is, to put it simply, a block of instructions that starts at some memory address. If you want to execute or call a subroutine, your program should jump to its first instruction (→ loading the address of the subroutine into the program counter).

After the subroutine (hopefully) completes at some point, the program flow should return to the part of the code that called the subroutine, more specifically to the next instruction after the call. But how does the subroutine know where that is?

By convention, the so-called return address is pushed onto the stack by the caller right before jumping to the subroutine. These two actions (pushing the return value and jumping to the function) are nicely combined into a single call instruction:

call foo

where the label foo is associated with the subroutine that we want to call. It will be replaced by the associated address when the program is assembled/linked.


Arguments

Most subroutines require some arguments to be specified. For example, in C you might call a subroutine foo with some arguments as follows:

foo(1, 2, 3, 4, 5, 6, 7, 8)

The arguments need to be placed somewhere where they can be found by the subroutine. Technically, the arguments may be placed anywhere - as long as the writer and the caller of the subroutine agree on the locations. To allow for universally usable subroutines, however, conventions exist that specify where arguments should be placed. The so-called C Calling Convention (as defined in the x86-64 System V ABI) specifies that the first 6 arguments should be placed in registers as follows:

  1. rdi

  2. rsi

  3. rdx

  4. rcx

  5. r8

  6. r9

If there are more than 6 arguments, the remaining arguments shall be pushed onto the stack in reverse order (first argument pushed last), before the return address is pushed.

So, the foo subroutine call from above would look like this in Assembly:

movq    $1, %rdi
movq    $2, %rsi
movq    $3, %rdx
movq    $4, %rcx
movq    $5, %r8
movq    $6, %r9
pushq   $8
pushq   $7
call    foo

Callee- vs Caller-Saved Registers

When you call a subroutine, this subroutine likely needs to use some registers for its operation. Thereby, some registers might change their value during a subroutine call, which may be unfortunate if those were registers that contained data important for your part of the program.

So how do you know which registers may be modified during a subroutine call? Unsurprisingly, the C Calling Convention also specifies what registers should be retained by a subroutine call and what registers may be used without restoring.

  • The former are called callee-saved registers - keeping (or saving) their value is the responsibility of the callee, so the subroutine that is called.

  • The latter are called caller-saved registers - the value may be changed by a subroutine and as such the caller of the subroutine needs to handle saving the value if it is still needed.

Whether a register is callee- or caller-saved is indicated by the color of the register name in the list of registers.


Return Value

The C Calling Convention specifies that subroutines should use the rax register for their return value (if any).


Stack Alignment

When you write your first programs you might encounter ominous crashes when calling subroutines. Even though there are many possible reasons, one reason could be your stack alignment.

The stack should always be 16-byte aligned before a call to a subroutine (so before pushing the return address). This alignment is mainly for improving the performance of memory accesses, especially in the context of SIMD instructions. The technical details go beyond the scope of this Manual.

At the start of your program or subroutine, the stack will not be 16-byte aligned (as the return address was pushed onto the stack) - However, writing a prologue as specified in the next part, will fix this alignment (by pushing the previous base pointer). From then on you will need to keep track of the changes to your stack alignment.

The image below shows an example stack frame. In this case, 3 (arbitrary) words have been pushed onto the stack (after the prologue), and thereby the stack pointer is misaligned by 8 bytes.

Last updated