security

Return oriented programming on RISC-V - Part 1

Bogdan Deac

17 Feb 2022 • 11 min read

There is a wide spread tendency nowadays to offload computation tasks from CPU to hardware dedicated units that offer better performance in terms of execution time and performance/watt, compared to generic processors. This approach involves the design of new hardware components, in silicon, that outperform the CPUs in tasks like AI inference, graphics acceleration, etc. These specialized components are placed in the same chip with the generic processors and in this way a System on Chip (SoC) is born.

We observe this trend among the most important chips producers like Xilinx, Intel, Nvidia and Apple.

By taking an overview of the SoC world I think that ARM is the most spread CPU architecture used to implement the CPU inside SoCs.

Jumping over the advantages and disadvantages of ARM, a licensed-based architecture, my attention was caught by RISC-V a new promising and open source CPU architecture, developed initially at Berkeley University.

I'm going to leave in-depth analysis of RISC-V for another article and I'm going to focus for now on the security concerns that I have about it.

In the last period there were a couple of exploits that targeted CPU architecture. Maybe the most vocal are Spectre and Meltdown, which were followed by many more. These types of attacks, that exploit side channels on modern processors without requiring any software vulnerability and independent of the operating system, reveal the importance of an instruction set architecture (ISA) designed with security in mind.

In this context, I see many advantages for open source CPU architectures like RISC-V, which allow fast updates from many contributors including security researchers and companies.

Why did I chose Return-Oriented Programming

After I graduated from university I decided to study cyber security in a masters program. There were many interesting topics between which was Return-Oriented Programming (ROP) a pretty clever exploit that was proved to be effective on many CPU architectures including x86-64, ARM and SPARC. There are protection mechanisms against ROP, implemented in software, including Address Space Layout Randomization (ASLR) and stack canaries.

At the moment when I studied this exploit I didn't find anything regarding ROP on RISC-V, so I decided to tackle the subject, which concluded in my dissertation thesis.

In this article I present how one can exploit RISC-V CPU using ROP.

Introducing ROP

Return Oriented Programming is a code-reuse technique that allows an attacker to bypass the Write Xor Execute protection mechanism provided by all modern CPU architectures and operating systems.

ROP was derived from return-into-libc attack.

Similarly to return-into-libc, ROP does not inject new code in victim’s memory space. Instead it uses code from modules that are loaded by the exploited application. On the other hand, ROP does not use whole functions from a library as return-into-libc does. It relies on short sequences of valid instructions, called gadgets.

Each gadget performs short computations, like addition or logic operations. The key to obtain a powerful ROP attack is to chain together all the gadgets that are needed to perform a certain task. By using this approach, ROP obtains a greater flexibility than return into-libc. Moreover, it is possible to obtain a Turing complete set of gadgets.

This empowers an attacker with a tool with the same capabilities of a programming language and everything is based on the legit code that already exists in the application’s memory space.

ROP Recipe

A successful ROP attack needs the following ingredients:

buffer overflow vulnerable application
reliable gadget chaining mechanism
a large set of gadgets

Buffer overflow vulnerable application

We need a buffer overflow vulnerable application to be able to overwrite the returning address from a benign function. In this way we can change the execution flow to our gadgets.

Reliable gadget chaining mechanism

Everything is about branching and jumping. We want to execute some instructions and jump to another sequence of useful instructions (gadget). This link between gadgets must be automated, implemented by the CPU, because we don't control the code from memory, we control only the stack.

A large set of gadgets

We want to be as powerful as possible. By choosing the gadget's structure and chaining mechanism wisely we can obtain a huge set of available gadgets from the most popular libraries, which gives us the same capabilities as a programming language.

ROP Overview

benign_function() is vulnerable to buffer overflow. By providing a carefully crafted payload, the attacker can replace the original return address with the address of the first gadget, G1.

The payload usually contains two types of data:

the addresses for all gadgets that are needed to perform a malicious task
the values that are needed by the gadgets to be loaded into registers

After the code execution is redirected to libc, G1 is executed, followed by G2 and G3. Please note that there is no malicious code in libc. G1, G2 and G3 are not malicious by themselves, but the order in which the instructions are executed (instruction 3 → instruction 8) and the values that these instructions are using (from the stack, that is controlled by the attacker) can lead to a malicious behavior.

Basic ROP example

Now let's have a look at a basic ROP attack on x86-64. We start from this architecture because it offers a straightforward and easy to understand example. I will use a 32-bit CPU example, but it works in the same manner for 64-bit. As you will see, the gadget chaining mechanism is specific for each CPU architecture because it depends on the instructions that perform stack operations.

This example is taken from Shacham’s “The geometry of innocent flesh on the bone: return-into-libc without function calls (on the x86)”. When benign_function() is called the return address is saved on the stack and the esp register points to that stack location. If the attacker overwrites the return address, and the locations that follow, the program execution is diverted to pop %edx gadget.

It works like this:

when benign_function() returns, the ret instruction is called and the execution jumps to the address pointed by esp, in our case the address of pop %edx, and the value of esp is incremented by 32 bits (point 1 in the above figure)
now esp points to 0xdeadbeef location. pop %edx loads edx register with the value pointed by esp (0xdeadbeef - point 2 in above figure) and increments esp by 32 bits
now esp points to next gadget’s address **(point 3 in above figure). The ret after pop %edx makes the program execution to jump to another gadget, the new value pointed by esp) and increments esp value by 32 bits

In this way the gadgets are chained together on x86-64 CPU architecture, leading to a very powerful attack.

ROP on RISC-V

Instruction Set

RISC-V is a RISC (Reduced Instruction Set Computer) based on load-store principle. That means that the only instructions that can access off-chip memory are load and store instructions. All instructions have fixed width and must be naturally aligned. Also, RISC-V does not have stack manipulation instructions, like x86-64 or ARM. As we have seen before, x86-64 has POP and RET instructions that updates the value of the stack pointer to point to the next value on the stack. On the other hand, RISC-V uses a sequence of lw (load word) and addi (add immediate) instructions to load a value from the stack and to update the stack pointer. Also, RISC-V does not have a dedicated instruction for return. The ret is a pseudo-instruction that is expanded to jalr zero, 0(ra), which sets the program counter to ra + 0 and saves the previous program counter’s value plus four to register zero, which is hardwired to zero. This implies that the return value must be copied from the stack into ra before returning.

Registers and Calling Convention

RV32I (Base Integer Instruction Set, 32-bit) has 32 general purpose registers. Some important aspects of registers’ organization are:

x0/zero is hardwired to zero
x1/ra holds the return address from a function
x2/sp holds the stack pointer
x10-x17/a0-a7 hold arguments for functions
x8-9, x18-27/s0-s11 (saved registers) preserve their values across function calls; any function that uses the saved registers must restore their original value before returning

Divide and conquer

As you have seen, on x86-64 the ROP attack is pretty simple. You have POP instruction to retrieve values from the stack and you can use RET to jump to the next gadget. All of these without thinking about stack pointer update. On RISC-V the task is more complicated. We have to use the load instruction + a memory address to retrieve data (from the stack or memory).

To obtain a powerful ROP attack you need a reliable gadgets’ chaining method. For x86-64 this objective is accomplished by the RET instruction, which updates the program counter register with the address of the next gadget and moves the stack pointer further. Taking this into account, one can use any function epilogue as a gadget. This approach can be used on RISC-V too, but it comes with some disadvantages.

sw    a5, 1518 (a4)
ld    ra, 8(sp)
addi  sp, sp,16
ret

In this example we have a gadget that stores a value in memory. The first line stores the value, the second one loads the ra register with the return address from the stack, the third one updates the stack pointer and the ret is executed. Simple, but let’s have a look at another example.

li    a0, 0
ld    ra, 40(sp)
ld    s0, 32(sp)
ld    s1, 24(sp)
ld    s2, 16(sp)
ld    s3, 8(sp)
addi  sp, sp, 48
ret

In this case a0 is loaded with zero, ra is loaded with the return address and s0 → s3 are loaded with their previous values from the stack because the calling convention specifies that the saved registers must preserve their values across function calls. If this gadget is used, the attacker has to provide dummy values on the stack to be loaded in the saved register which implies a bigger payload. This type of function epilogue is quite often in libc.

For this reason I came up with another idea. I observed that there are many sequences of jump to saved register instructions in libc. So, I divided the gadgets in two categories: functional gadgets and linking gadget.

Functional gadgets

The functional gadget are the ones that are executing the useful operations for the attack and end in jalr s0→s11. So, the link with the next gadget is accomplished by the saved registers.

Here are some example of functional gadgets.

// Load from memory to register
ld    a0, 0(s0)
jalr  s1

// Load a constant to a register
li    a2, 0
jalr  s3

// Calling the execve system call
li    a7, 221
ecall

Linking gadget (the charger)

This is a special gadget that is used to link all the other gadgets together. It is the first executed gadget and is usually called at the beginning of the attack. It loads the saved registers with the addresses of all the functional gadgets that are going to be used in an attack, from here the charger name.

In the above figure we have an example of a charger gadget.

The first instruction (green) loads the ra register with the address of the first functional gadget.
The following seven instructions (blue) load the saved registers s0→s6 with the addresses of all the functional gadgets that will be used in the attack.
The yellow instruction updates the stack pointer.
Finally, the orange instruction executes the ret that will jump to the address loaded in ra (first functional gadget).

Theoretical attack

Here we have a theoretical return-oriented programming attack running on RISC-V architecture.

The stack space of benign_function() (and all the following positions) **is overflowed by the attacker and the legit return address is changed with the address of the charger gadget (malicious return address).
Before returning, benign_function() frees its stack space by moving sp forward (to the next free position).
When benign_function() returns the program execution jumps to the charger gadget (ld ra, 72(sp), ....), which loads the ra register with the address of the first functional gadget. After that, the saved registers s0 → s6 are loaded with values from the stack, provided by the attacker.
At the end, the charger gadget updates the sp (another useful values for the attacker can follow on the stack) and executes ret.
At this point the execution will jump the first functional gadget which will load a0 with the value found at the address stored in s0 (ld a0, 0(s0)) and will jump to the address stored in s1 (jalr s1), another functional gadget.

A complete functional example of return-oriented programming on RISC-V will be presented in a future article.

Advantages and disadvantages of this method

The main advantage of this method is the reduction of the payload that is used in the attack. If the method based on functions’ epilogues is used, the attacker has to provide dummy values to load the saved registers restored before function return. This can lead to large payloads that can be ineffective in some cases. On the other hand, I didn’t find a Turing complete set of gadgets in libc, which reduces the attacker’s capabilities.

Security concerns

Now that we know that RISC-V architecture can be exploited using Return Oriented Programming technique we can ask: is RISC-V insecure? No. ROP attacks leverage on function return mechanism and function calling convention which are not malicious by themselves. Moreover, one needs a vulnerable application that allows the attacker to hack the return address from a function. Also, there are many protection mechanisms that stop ROP attacks, as we will see in the next section. From this point of view, I consider that RISC-V has a significant advantage over the closed CPU architectures because each processor manufacturer can choose which protection mechanism to integrate.

Protection mechanisms

What can we do to prevent ROP attacks? There are many solutions that deal with this issue. Most of them are a combination of software and hardware collaboration.

Address Space Layout Randomization: this is a software computer security technique that randomize the addresses of processes, stack, heap and libraries, in memory. Each time a process is launched in execution it will be placed at a random address in memory. This means, using our attack example, that the attacker doesn’t know the addresses of the gadgets (libc). Read more about this here.
G-Free: this is a compiler-based solution that eliminates all unaligned free-branch instructions inside a binary executable. More details here.
ROPdefender
ROPGuard
ROPecker
Stack canaries: this technique is used to detect a stack buffer overflow and prevents the jumping to ROP gadgets. It works by placing a random integer value (canary) on the stack, right before the return address. The canary value is checked before the return instruction is executed to make sure that the return address was not modified (if the return addresses is overwritten the canary value is changed too). More details here.
Zipper Stack: this is a novel technique (2019) that protects the return addresses by a chain structure using encryption. It is able to protect all the return addresses from the stack and ensures that the functions are returned in the correct order. The authors had implemented this solution on RISC-V CPU and had modified the RISC-V instruction set. Here, the advantages of an open source CPU architecture are highlighted. More details here.
FIXER: another novel technique (2019) that provides a protection mechanism against ROP by enforcing control-flow integrity. Also, it was implemented and tested on RISC-V CPU.