Address Sanitization
When working with the processor directly through assembly or other memory-unsafe languages, you have full control over the memory your program uses. It is very easy for your program to access memory that it shouldn't have access to, or forget to release memory after using it.
Do you see what's wrong with the code below?
The malloc
call allocates 8 bytes on the heap and returns a pointer to the start of the allocated memory. Whenever allocating memory on the heap, it must be released once not used anymore, by calling free
. The code above generates what is known as a memory leak.
It is essential for your programs to not have any memory errors. Memory-related issues lead to crashes, either by the undefined behavior of an illegal access or by running out of memory when repeatedly encountering a memory leak.
CodeGrade's autograder tests whether your program is memory-safe by assembling your code with ASan. Some tests may fail due to ASan detecting memory errors and exiting the program ungracefully.
This is intended testing behavior, as we expect you to write algorithmically correct programs, including memory correctness. Therefore, it is a requirement to ensure that your program does not leak or illegally access memory, producing memory leaks, buffer overflows, or segmentation faults.
To assist you in tackling memory safety, we heavily recommend using ASan in your coding workflow, always checking for possible memory-related errors.
How to detect memory errors
We can detect memory errors such as memory leaks or illegal reads and writes by using ASan. At a high level, it works by instrumenting your code and keeping track of the memory your program has allocated and has access to. You can enable address sanitization by adding the -fsanitize=address
flag to your assembler's options.
The framework's build system, CMake assembles your programs by default with the -fsanitize=address
option to enable address sanitization.
When detecting a memory error, ASan will terminate the program's runtime with an exit code of 1 and produce a stack trace. The code above will produce the following stack trace:
While this error might seem cryptic or even scary at first, it holds tremendous value in debugging, because it provides the exact location where the error occurred. In the example above, we identify the error to happen in the file /home/student/my_program.S
at line 9. There, we find the instruction call malloc
. Paired with the error message detected memory leaks, we quickly deduce that we forgot to free the memory allocated on the heap.
Let's fix the code! We know that malloc
returns a pointer to the heap-allocated memory, so let's pass that pointer to free
. After modifying the code, ASan does not produce errors anymore. Hurray!
Reading the stack trace should be done top-down, from the list identified as #0 #1 #2 ...
. The lowest items in the list are the most recent calls since the error occurred. Usually, you want to find the first instance in the stack trace where you can recognize one of your program's source files. In the example above, the first and only file that we know of is /home/student/my_program.S
. The stack trace also includes the subroutine's name just before the file it is located in.
All of this information should help you identify where in the code the error occurs, to help you set up your debugger for further examination.
Getting stuck
Sometimes, the stack traces are not very helpful, especially when encountering a segmentation fault. These situations can get confusing and overwhelming to the best of us, spending hours lost in thought trying to find a fix and questioning our coding choices.
To get out of such a situation, we recommend a four-step approach:
Isolate
Isolate the block of code in a new context, perhaps by creating a new, empty source file. Don't be afraid of copy-pasting some of your code into a new file and trying to reproduce the problem. This step helps you determine whether the problem is caused by the narrowed component or by its interaction with other components.
All errors in programming originate from human error; the computer only does what you ask it to do. Therefore, when tackling complex memory errors, it is important to conceptualize the few classes of human errors that lead to computer errors:
Assumptions You assumed that something in your code would be valid in a broader context when it was in a narrow context. A common example of this is assuming all functions operate on 64-bit registers. Commonly, libc function calls are notorious for using 32-bit registers and expecting you to do so as well when interacting with them.
Not handling exceptions gracefully It is common for your code to get to a point where it cannot continue. Assume you wanted to ask your program's user for a number and instead they provided a letter. In this case, you want your program to stop executing and show a message to the user. However, in doing so, you must ensure that your program exits gracefully, by freeing all allocated memory and restoring all callee-saved registers. It is never enough to just
return
and call it a day.Pointer arithmetic When working with pointers, you will most likely want to advance to a neighboring address to access data. Always make sure you save your base pointer and never go outside the allocated space, by remembering the size of the allocated memory. The
free
call will only work with the pointer initially returned by the heap-allocating call.Understanding the stack and the heap When working with memory, you should always know whether your data lives on the stack or the heap. This crucial differentiation will dictate the way you are going to debug the problem. Never try to use
free
with stack-allocated memory!
Limitations
There are a few limitations you will encounter when using ASan.
Firstly, you won't be able to use ASan when in debugging mode. Therefore, it is always recommended to run your program as a standalone executable to test for memory errors, and then use the debugger to narrow down the context of the error.
Secondly, ASan does not fully instrument assembled code, as it is designed to handle compiled code. That being said, ASan won't be able to detect stack-related errors as effectively, such as stack overflow or restoring callee-saved registers.
We expect you to test your programs for these types of errors as well. The automated tests will fail if the stack is accessed illegally.
To circumvent the last limitation, we recommend using valgrind
, a dynamic analysis tool more powerful than ASan for detecting memory errors. It should be possible to pass all CodeGrade tests without running your program with Valgrind. However, it may be useful to use this tool if some tests fail on CodeGrade but not locally. To learn how to run Valgrind, please refer to the official documentation.
ASan with Docker for macOS
You only need to use Docker to test your programs with ASan. In situations that do not require address sanitization, you may ignore this step and build your programs regularly or use the debugger.
A caveat of using the Rosetta layer for running x86_64 executables on Apple Silicon is losing the ability to instrument binaries. Therefore, it is necessary to virtualize a x86_64 environment to benefit from ASan memory checking.
We use Docker to create a virtual Ubuntu 24.04 machine for your executables. The framework provides a Dockerfile and a useful script co-docker
to run your assignments' executables in the virtual environment. After installing Docker, all you need to do is run the co-docker
script with the normal executable you would want to run to execute it under Ubuntu 24.04 and benefit from address sanitization.
In short, you need to pass as parameter to the co-docker
script the entire command you would use to run the executable locally.
Last updated