Building and Running Programs

Writing programs is not too much fun if you can't run them. But to run a program, it needs to be built first. But what does that mean?

Technical Side

You are writing your programs in an assembly language. Assembly language is made to be human-readable, but your CPU wouldn't know what to do with it. Thereby, it needs to be assembled (). The assembler (e.g., as, or for clang the LLVM integrated assembler) converts the (target-specific) Assembly code into (target-specific) machine code, as so-called object files.

The assembled object files are not yet runable code. There may be calls to library functions (e.g., printf) or use of other external symbols that need to be resolved. That is where the final step of linking comes in. The linker (e.g., GNU's ld, LLVM's lld) will take the object file (or object files) of your program and perform several tasks to create the final executable. Some of these tasks include resolving external symbols as mentioned before, but also assigning final memory addresses to the program's instructions and data and adjusting addresses in the machine code accordingly.

The output of the linker will be the final executable (machine-code) version of your program.

In Practice

The linking process needs to know where to look for library files, which may be in a number of locations and different based on the OS. Luckily there is an easy cheat to avoid this cumbersome listing/lookup process used by almost any sane programmer: use a compiler collection like gcc or clang to do the work on our behalf. So to assemble and link an Assembly program the work is reduced to:

clang source.S -o dest

If you are interested in the underlying work performed by clang, you can add the -v flag and have a look at the terminal output.

Apple Silicon / ARM Caveats

If you are working on an Apple Silicon Mac (so with an M1/M2/...-family processor) you may notice that running the above command for even the example program will flood your terminal with unrecognized instruction mnemonic error messages for almost every line of the program.

The, quite simple, cause of this is that you are trying to assemble a program that is written in a language that your CPU doesn't speak. More technically speaking: the language of the file is target-specific for the x86-64 architecture, but your processor is based on the ARMv8 architecture.

To fix this and successfully build the program, you can tell the assembler and linker what the target architecture of the program is using the -arch flag, as follows:

clang source.S -o dest -arch x86_64

Now you have an executable binary (→ program) in a (machine-) language that your processor doesn't speak... Great...

However, when transitioning to ARM, Apple couldn't just leave all existing software behind. Thereby, your Apple Silicon version of macOS comes equipped with a so-called compatibility layer (Rosetta 2). This dynamic binary translator allows your ARM processor to execute x86-64 machine code. The translation brings a somewhat significant performance overhead, however, for programs of small size, like the ones you will write as part of the assignments of this course, this overhead is negligible.

Framework Build System

It is important to know how to build your programs. However, as some of the assignments require some slightly more intricate assembling and linking options, the process is taken care of for you through CMake, a build system for assembly, C, and C++ projects. Put simply, this system handles all necessary steps to create a given "target".

To start using CMake, you first need to initialize your build directory. This directory will act as the skeleton for your project, holding all build files and Makefile configurations. To initialize your build directory as .build (the directory name used by the rest of the framework), execute the following from the root of your project:

export CC=clang; cmake -B .build

After the .build directory is built, we can now assemble programs! You do so by executing:

cmake --build .build --target <assignment_target>

The targets for the assignments are disclosed under each assignment as a hint. You may examine the targets directly in the .cmake/executables.cmake file.

After building the assignment, the executable will appear in the root directory of the framework. You may execute this by running:

./<assignment_target>

Observe that the executable's name is the same as the assignment target. Some assignments may have multiple targets for multiple executables, that will indeed run different programs, depending on the context of the assignment.

Examining the CMake files in the .cmake directory and the CMakeLists.txt file is a great way to find and further understand the above-described concepts. However, there are many things in these files that you do not have to understand at this point.

You are free to modify the CMakeLists.txt at your own risk (e.g., to add targets for specific program invocations/extra assignments/other tweaks that may help your process).

However, be advised that CodeGrade will use the CMakeLists.txt as given in the framework to build your programs.

PreviousC/C++ vs Assembly NextAddress Sanitization

Last updated 4 months ago