Address Sanitization

When working with the processor directly through assembly or other memory-unsafe languages, you have full control over the memory your program uses. It is very easy for your program to access memory that it shouldn't have access to, or forget to release memory after using it.

Do you see what's wrong with the code below?

.global main
.text

main:
    pushq    %rbp
    movq     %rsp, %rbp

    movq     $8, %rdi
    call     malloc

    movq     $0, %rax
    movq     %rbp, %rsp
    popq     %rbp
    ret

The malloc call allocates 8 bytes on the heap and returns a pointer to the start of the allocated memory. Whenever allocating memory on the heap, it must be released once not used anymore, by calling free. The code above generates what is known as a memory leak.

It is essential for your programs to not have any memory errors. Memory-related issues lead to crashes, either by the undefined behavior of an illegal access or by running out of memory when repeatedly encountering a memory leak.

CodeGrade's autograder tests whether your program is memory-safe by assembling your code with ASan. Some tests may fail due to ASan detecting memory errors and exiting the program ungracefully.

This is intended testing behavior, as we expect you to write algorithmically correct programs, including memory correctness. Therefore, it is a requirement to ensure that your program does not leak or illegally access memory, producing memory leaks, buffer overflows, or segmentation faults.

To assist you in tackling memory safety, we heavily recommend using ASan in your coding workflow, always checking for possible memory-related errors.

How to detect memory errors

We can detect memory errors such as memory leaks or illegal reads and writes by using ASan. At a high level, it works by instrumenting your code and keeping track of the memory your program has allocated and has access to. You can enable address sanitization by adding the -fsanitize=address flag to your assembler's options.

The framework's build system, CMake assembles your programs by default with the -fsanitize=address option to enable address sanitization.

When detecting a memory error, ASan will terminate the program's runtime with an exit code of 1 and produce a stack trace. The code above will produce the following stack trace:

=================================================================
==79602==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x562a6881ac19 in malloc (/home/student/calc+0x11dc19) (BuildId: 272f98be3f68e4850d25789c3a733299f5f39721)
    #1 0x562a68866c03 in main /home/student/my_program.S:9
    #2 0x73908a73e487 in __libc_start_call_main /usr/src/debug/glibc/glibc/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #3 0x73908a73e54b in __libc_start_main /usr/src/debug/glibc/glibc/csu/../csu/libc-start.c:360:3
    #4 0x562a68728084 in _start (/home/student/calc+0x2b084) (BuildId: 272f98be3f68e4850d25789c3a733299f5f39721)

SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s).

While this error might seem cryptic or even scary at first, it holds tremendous value in debugging, because it provides the exact location where the error occurred. In the example above, we identify the error to happen in the file /home/student/my_program.S at line 9. There, we find the instruction call malloc. Paired with the error message detected memory leaks, we quickly deduce that we forgot to free the memory allocated on the heap.

Let's fix the code! We know that malloc returns a pointer to the heap-allocated memory, so let's pass that pointer to free. After modifying the code, ASan does not produce errors anymore. Hurray!

.global main
.text

main:
    pushq    %rbp
    movq     %rsp, %rbp

    movq     $8, %rdi
    call     malloc
    
    # We must free the memory returned by malloc
    movq     %rax, %rdi
    call     free

    movq     $0, %rax
    movq     %rbp, %rsp
    popq     %rbp
    ret

Reading the stack trace should be done top-down, from the list identified as #0 #1 #2 .... The lowest items in the list are the most recent calls since the error occurred. Usually, you want to find the first instance in the stack trace where you can recognize one of your program's source files. In the example above, the first and only file that we know of is /home/student/my_program.S . The stack trace also includes the subroutine's name just before the file it is located in.

All of this information should help you identify where in the code the error occurs, to help you set up your debugger for further examination.

Getting stuck

Sometimes, the stack traces are not very helpful, especially when encountering a segmentation fault. These situations can get confusing and overwhelming to the best of us, spending hours lost in thought trying to find a fix and questioning our coding choices.

To get out of such a situation, we recommend a four-step approach:

Reproduce

Isolate the problem to a specific input. Make sure that you can consistently reproduce the error with the same input. Do not proceed to the next steps if your problem is inconsistent. You may need to remove certain components from your code to achieve this consistency.

Narrow down

Identify the block of code producing the error and narrow down the program's context. Usually, address sanitizer will point out the line of code the error originates from. In the rare occasion this doesn't happen, use a binary-search approach to narrow down the context.

Isolate

Isolate the block of code in a new context, perhaps by creating a new, empty source file. Don't be afraid of copy-pasting some of your code into a new file and trying to reproduce the problem. This step helps you determine whether the problem is caused by the narrowed component or by its interaction with other components.

Use the debugger

If you have been using print-based debugging up until this point, it is now time to give yourself all the help you can get. Try to understand all of the interactions in the narrowed-down context and identify the human error.

All errors in programming originate from human error; the computer only does what you ask it to do. Therefore, when tackling complex memory errors, it is important to conceptualize the few classes of human errors that lead to computer errors:

Assumptions You assumed that something in your code would be valid in a broader context when it was in a narrow context. A common example of this is assuming all functions operate on 64-bit registers. Commonly, libc function calls are notorious for using 32-bit registers and expecting you to do so as well when interacting with them.
Not handling exceptions gracefully It is common for your code to get to a point where it cannot continue. Assume you wanted to ask your program's user for a number and instead they provided a letter. In this case, you want your program to stop executing and show a message to the user. However, in doing so, you must ensure that your program exits gracefully, by freeing all allocated memory and restoring all callee-saved registers. It is never enough to just return and call it a day.
Pointer arithmetic When working with pointers, you will most likely want to advance to a neighboring address to access data. Always make sure you save your base pointer and never go outside the allocated space, by remembering the size of the allocated memory. The free call will only work with the pointer initially returned by the heap-allocating call.
Understanding the stack and the heap When working with memory, you should always know whether your data lives on the stack or the heap. This crucial differentiation will dictate the way you are going to debug the problem. Never try to use free with stack-allocated memory!

Limitations

There are a few limitations you will encounter when using ASan.

Firstly, you won't be able to use ASan when in debugging mode. Therefore, it is always recommended to run your program as a standalone executable to test for memory errors, and then use the debugger to narrow down the context of the error.

Secondly, ASan does not fully instrument assembled code, as it is designed to handle compiled code. That being said, ASan won't be able to detect stack-related errors as effectively, such as stack overflow or restoring callee-saved registers.

We expect you to test your programs for these types of errors as well. The automated tests will fail if the stack is accessed illegally.

To circumvent the last limitation, we recommend using valgrind, a dynamic analysis tool more powerful than ASan for detecting memory errors. It should be possible to pass all CodeGrade tests without running your program with Valgrind. However, it may be useful to use this tool if some tests fail on CodeGrade but not locally. To learn how to run Valgrind, please refer to the official documentation.

ASan with Docker for macOS

You only need to use Docker to test your programs with ASan. In situations that do not require address sanitization, you may ignore this step and build your programs regularly or use the debugger.

A caveat of using the Rosetta layer for running x86_64 executables on Apple Silicon is losing the ability to instrument binaries. Therefore, it is necessary to virtualize a x86_64 environment to benefit from ASan memory checking.

We use Docker to create a virtual Ubuntu 24.04 machine for your executables. The framework provides a Dockerfile and a useful script co-docker to run your assignments' executables in the virtual environment. After installing Docker, all you need to do is run the co-docker script with the normal executable you would want to run to execute it under Ubuntu 24.04 and benefit from address sanitization.

./co-docker a1

./co-docker a3a-iter

./co-docker a4 a4-diff/tests/test1/a.txt a4-diff/tests/test1/b.txt

In short, you need to pass as parameter to the co-docker script the entire command you would use to run the executable locally.

PreviousBuilding and Running Programs NextA0: A Running Example

Last updated 4 months ago