--- title: A Rusty Stack Jump description: Jumping into a new stack with Rust date: 2025-02-27 featuredImage: featuredImageDesc: tags: - rust - asm - systems - operating systems - async --- import { Notes, PostImage } from "~/components/Markdown"; import { Tree } from "~/components/Tree"; In my quest to learn to build an async runtime in Rust, I have to learn about CPU context switching. In order to switch from one async task to another, our async runtime has to perform a context switch. This means saving the current CPU registers marked as `callee saved` by the System V ABI manual and loading the CPU registers with our new async stack. In this article, I will show you what I have learned about jumping onto a new stack in a x86_64 CPU. I'm learning about async runtimes in Rust based on the amazing book [Asynchronous Programming in Rust: Learn asynchronous programming by building working examples of futures, green threads, and runtimes](https://www.packtpub.com/en-mt/product/asynchronous-programming-in-rust-9781805128137) It's an amazing book, don't get me wrong, but I feel like the explanation can be hand-wavy sometimes. Thus, I write this to archive my own explanation and potentially help other people who also struggle with the subject. Most async runtimes in Rust do not use stackful coroutines (which are used by Go's `gochannel`, Erlang's `processes`) and instead, use state machines to manage async tasks. ## Contents
## Setting the stage Why do we need to swap the stack of async tasks in a runtime with stackful coroutines ? Async tasks, by nature, are paused and resumed. Everytime a task is paused to move into a new task, we would have to save the current context of the task that is running and load the context of the upcoming task. ## Jumping into the new stack Here is the code in its entirely, I'd recommend you run this on the [Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2024). I have left comments through out the code so you can get the general idea. Note that you have to manually stop the process. ```rust file="stack_swap.rs" use core::arch::asm; // stack size of 48 bytes so its easy to print the stack before we switch contexts const SSIZE: isize = 48; // a struct that represents our CPU state // // This struct will stores the stack pointer #[derive(Debug, Default)] #[repr(C)] struct ThreadContext { rsp: u64, } // Returning ! means // it will panic OR runs forever fn hello() -> ! { println!("I LOVE WAKING UP ON A NEW STACK!"); loop {} } // new is a pointer to a ThreadContext unsafe fn gt_switch(new: *const ThreadContext) { // inline assembly asm!( "mov rsp, [{0} + 0x00]", // move the content of where the new pointer is pointing to, into the rsp register "ret", // ret pops the return address from our custom stack—in our example, the address of hello. in(reg) new, ); } fn main() { // initialize let mut ctx = ThreadContext::default(); // stack initialize // ie. 0x10 let mut stack = vec![0_u8; SSIZE as usize]; unsafe { // we get the bottom of the stack // remember that the stack grows downward from high memory address to low memory address // i.e 0x40 -> because 0x30 = 0x40 - 0x10 and 0x30 = SSIZE in decimal // NOTE: offset() is applied in units of the size of the type that the pointer points to // in our case, stack is a pointer to u8 (a byte) so offset(SSIZE) == offset(48 bytes) == offset(0x30) let stack_bottom = stack.as_mut_ptr().offset(SSIZE); // we align the bottom of the stack to be 16-byte-aligned // this is for performance reasons as some CPU instructions (SSE and SIMD) // The technicality: 15 is b1111 so if we do (stack_bottom AND !15) we will zero out the bottom 4 bits // // we also want the bottom of the stack pointer to point to a byte (8bit or u8) let sb_aligned = (stack_bottom as usize & !15) as *mut u8; // Here, we write the address of the hello function as 64 bits(8 bytes) // Remember that 16 bytes = 0x10 in hex // So we go DOWN 10 memory addresses, i.e from 0x40 to 0x30 // NOTE: 16 bytes down (0x10) even though, the hello function pointer is ONLY 8 bytes // This is because the System V ABI requires the stack pointer to be always be 16-byte aligned std::ptr::write(sb_aligned.offset(-16) as *mut u64, hello as u64); // we write the stack pointer into the rsp inside context ctx.rsp = sb_aligned.offset(-16) as u64; for i in 0..SSIZE { println!("mem: {}, val: {}", sb_aligned.offset(-i as isize) as usize, *sb_aligned.offset(-i as isize)) }; // we go into the function // we will write our stack pointer to the cpu stack pointer // and `ret` will pop that stack pointer gt_switch(&mut ctx); } } ```