Introduction to X86 Assembly on Linux

Knowledge of assembly is an important tool for any programmer. It allows you to interact directly with the CPU, and is an often overlooked but crucial part of any computer. This is a tutorial on assembly as well as low-level programming in general, to help give one a more complete understanding of the system that they might use daily.

Here is a Hello World program in the NASM (Netwise Assembler) dialect of assembly, for Linux on x86-64:

section .rodata ; (1)
  hello_world_string: db "Hello, World!", 10 ; (2)
  hello_world_string_len: ; (3)
section .text ; (4)
global _start ; (5)
_start:
  mov rax, 1 ; (6)
  mov rdi, 1 ; (7)
  mov rsi, hello_world_string ; (8)
  lea rdx, [hello_world_string_len - hello_world_string] ; (9)
  syscall ; (10)
  mov rax, 60 ; (11)
  mov rdi, 0 ; (12)
  syscall

To build and run this program, place it into a file named hello_world.s, then run nasm -f elf64 hello_world.s and ld hello_world.o -o hello_world. Now you can run the binary ./hello_world.

At (1), we define a section named rodata which will be placed in our binary. This is a section for containing read-only data. Other data sections include data which is read and write data and bss which is uninitialized data. They will be covered later.

At (2), the bytes (db means define bytes) representing the ANSI for “Hello, World!”, followed by the byte 10, which is a newline, will be inserted directly into the binary. hello_world_string is a NASM label which refers to start of the string (“H”). At (3), hello_world_string_len is a label which represents the byte immediately after the new line, and is used to compute the length of the string.

(4) defines the text section which is where our instructions are actually placed. Then, at (5), to ensure that Linux knows where to start executing our program, we define the special label _start and make it global so that Linux can “see” it.

The next section is a little bit more complicated. The way in which programs can interact with the Linux kernel - necessary to do anything interesting! - is done via system calls, also called a syscall. There are hundreds of syscalls on Linux which allow you to do anything from allocate memory, to establish network connections, do file I/O, and more. The Linux kernel is unusual because its syscall interface is very stable compared to Windows or FreeBSD/OpenBSD, so it is possible to program against it directly. However, most programs use a library to abstract this away; often libc, also known as the C standard library. The main benefit of this is that you can easy compile for different operating systems, as the syscall interface is different for each one.

The x86-64 architecture is an updated version of the x86-32 architecture, which had the limitation of a maximum of 4 GiB of memory. x86 can refer to either or both architectures, although x64 is sometimes used to mean x86-64. The x86-64 architecture has 16 general-purpose registers (there are more registers, but they have specific uses), which are 8 bytes or 64 bits each. They are like RAM, but orders of magnitudes faster to read and write from. They are also necessary to speak to the Linux kernel, as it expects its syscall data to be in specific registers. Linux uses the rax register for the syscall name. The first character, r, means that it is 64 bits long, and ax means that it is the acumulator register, although this meaning has long become obsolete and it is now a general purpose register.

The assembly instruction mov moves the second argument (such as 1) into the first argument (such as the register rax). It is the most basic instruction in assembly.

Looking at (6), according to the table linked earlier, 1 refers to the “write” syscall. This allows you to write data to a file descriptor, which may be standard output, standard input, standard error, or an actual file. This is a common pattern in Linux, where files have the same interface as other forms of I/O. (7) provides the actual file descriptor, which is a constant 1 to mean standard output. If you are programming in assembly often, you may create shortcuts for this, to reduce mental load. It may be surprising to use numbers, and have to manually give everything a meaninful name, but this is the most efficient form of communication, and helps you understand what is really happening.

An example of using macros to make assembly code easier to read:

mox rax, SYS_WRITE
mov rdi, STDOUT

In Linux, syscall arguments are expected in specific registers. We have seen that rax is used for the syscall itself and rdi is used for the first argument. Subsequent arguments are represented by rdi, rsi, rdx, rcx, r8, and r9. The names of registers only get stranger the longer you look at them, but x86-64 has a lot of historical baggage so this is what you get I’m afraid. The kernel will also “clobber” - or change to an unknown value - the registers rcx and r11 after a syscall so you cannot expect them to be the same.

At (8), rsi is loaded with a pointer to the start of the “Hello, World!”, 10 string. If you are familiar with C, then pointers will be a simple concept, but if you are not, it is simply a location in RAM. When our program is executed, RAM is loaded with all the instructions and data that we have provided in the assembly file. If you’re interested, loading instructions and data in the same place is known the von Neumann model. It is useful to know about the alternatives, but we will keep it simple for now. So the rsi register now tells us where the first byte of the Hello World string is.

However, this is not enough to actually print the string - we do not know where it ends. So, at (9) we calculate the length of the string using the lea (load effective address) instruction, which is a handy shortcut to do maths which we won’t get into at the moment. The result is loaded into the next syscall argument register, rdx, which is used by the write syscall for the number of bytes after the pointer in rsi to actually write.

At last, we make the syscall instruction which jumps into the Linux kernel, printing “Hello, World!” and a newline to standard output (the console).

Unfortunately, we are not done yet. We have not terminated the program. The CPU will keep trying to read instructions, and our program is placed at an arbitrary point in RAM. So, it will take whatever happens to be in RAM that is after our program, and treat it like an instruction. More often than not, this will result in the notorious Segmentation Fault (core dumped) error, meaning we are trying to access a location in RAM that we’re not allowed to access.

We need to tell the CPU to stop running the program, and this is done by another syscall, exit, represented by the number 60 at (11). This requires one argument, which is 0 to indicate the program terminated successfully, so we pass that in the rdi register (12). Finally, the program is terminated and it returns control to the user.

In summary, we have written a “Hello, World” program in assembly language, and learned about various low-level programming concepts. Part 2 will go more in-depth on how you can actually Get Stuff Done in assembly.