Please help test our new compiler micro-service
Starting out: we use file
and ls
to see that our binary is a relatively small 64-bit ELF.
checksec.sh
tells us it has the standard protections plus PIE (NX is standard, of course).
Stick it in Radare and examine it: we see that after greeting us, it goes into an infinite loop, which:
RDTSC
) how many cycles it takes to execute
Furthermore, what the template does is surround our code with a loop that runs it 4096 times and then returns. So, inst prof stands for “instant profiler”! There's nothing really to exploit, per se, to get execution here, but we are restricted in that there's not much you can do in four bytes.
It's pretty clear to me that there's not much we're going to be able to do in a single prompting. We're going to need to progressively build up what we want to do through several invocations. The first order of business there is to find a register whose value is preserved through iterations. Compilers tend to use registers like EAX before they go on to “later” registers like R8, R9, etc., so I first checked R15, setting it with GDB in one iteration and checking that it was the same in the next one. R15 is indeed preserved, as is R14. I believe R13 was preserved as well but I didn't need to use that fact.
When we have a register preserved, what can we do?
Well, we can zero it out: XOR R15, R15
is three bytes, so it fits.
We can also set its lower byte: MOV R15B, imm
is four bytes.
Shifting by most amounts of bits is four bytes; this would normally be acceptable, but our four bytes are each executed 4096 times,
so shifting by anything 4096 times will zero out our register, which is not useful.
x86 encodes a shift by one bit differently, however, in a way such that SHL R15, 1
is only three bytes,
which leaves us one byte for a RET
to prematurely exit the loop and make sure it only executes once.
Of course, we now have to do this many times to shift by more than one bit, but that is not hard.
With these three instructions we can now place arbitrary immediate 64-bit values in R14 or R15.
With these, it's simple to construct a write-what-where primitive.
Place an address in R14, a value in R15, and execute MOV [R14], R15
.
However, we still don't know where we can write, because of ASLR and PIE.
To bypass ASLR we need to leak an address.
More generally, we need to leak anything at all.
We don't have enough control over the program yet to make it call
write
, printf
or any other function to output anything to us directly,
so we'll have to go about it some other way…
The program already does provide some output, so it's worthwhile to see if we can utilize that somehow.
It outputs how many cycles it took to execute our loop,
so if we want to use this to exfiltrate data, we'll need to manipulate how long it takes to execute the loop.
The first instinct is to test some bit and branch to the RET
or continue the loop depending on the bit's value.
Unfortunately that takes up too many bytes to represent,
and we can't break it up over multiple executions because the status flags are clobbered.
I also considered using an instruction like a multiply whose time to execute varies depending on the data,
but that seemed fragile and was also pushing against our four-byte limit.
Ultimately I decided to subtract a bit from the counter used by the loop, ECX
.
If the bit is zero, it performs all 4096 iterations.
If the bit is one, the counter goes down by two each time rather than one, and the loop executes 2048 times.
This uses roughly half as many cycles, which is a noticeable difference.
With this we can extract a single bit of information. To extract a whole QWORD, we can move the QWORD we want to exfiltrate into another register, mask it to a single bit and extract it, then shift the original register by one bit and repeat until we've got the whole QWORD.
Then it's easy to leak any register value, like RSP, for example (if we wanted it).
I was more interested in defeating PIE, so I leaked the QWORD on the top of the stack, the return address back into the executable.
I also leaked RBX
, a pointer to the page that it allocated and was at that time executing.
This becomes useful later.
It's also straightforward to leak memory by storing an immediate address,
dereferencing it (e.g. MOV R15, [R14]
), and leaking that.
From here it's easy to get control of RIP.
We can store an address in R15, for example, and then execute JMP R15
.
Or we could use our write-what-where primitive to overwrite the GOT.
Or we could leak RSP and overwrite the return address.
But back to our objective: we want a shell.
We already have handy functions for allocating new writable pages and making them executable,
so it seems like using shellcode is feasible here if we wanted to.
The executable is linked against libc, so we could also use a ret2libc into system
.
The problem with both of these approaches is that we need to control the argument being passed,
either into the make_page_executable
function or the system
function.
Since we're on 64-bit, we need to modify RDI.
Unfortunately it's too many bytes to modify RDI and call the function at the same time,
and RDI gets clobbered (severely!) between iterations so splitting it into two executions won't work either.
Since we do control the allocation of a page and can write to it and know its address,
and have the ability to change RSP if we want, we could construct a ROP chain and pivot to it.
However, to change RDI, we need a POP RDI; RET
gadget, and there's none in the executable.
There are plenty in libc, but by doing that, we introduce a dependency on a specific libc.
I couldn't think of any other options, so I went forward with using libc.
To use libc, we have to know where it is.
Using our read primitive, we can read the GOT to figure out where any of the imported functions live in libc.
We can then subtract and add offsets within libc to find other functions or gadgets in libc.
You can find the offsets of exported functions with readelf -s
,
offsets of gadgets with ROPgadget
or rp++,
and offsets of strings with rabin2 -z
.
Since I'm depending on libc anyway, I might as well forego shellcode
and just do ROP with ret2libc, so I extracted system
, /bin/sh
, and my POP RDI; RET
gadget.
So all together, we:
RBX
to find where we're being executed
alloc_page
and infer its location (it'll be right before the page we're executing from)
/bin/sh
string into RDI
system
_exit
(I like being clean but not enough to set RDI
before calling _exit
)
RSP
onto our ROP chain
Run it locally, et voilà, 🐚! Unfortunately we're unlikely to have the same version of libc as is running remotely, so we have a little more work to do to make this work there. You'll be able to tell if this is the case if you leak an address in the GOT, subtract the offset, and the result is not page-aligned.
While dated, the easiest way to identify a libc is to use libcdb.com. Entering two symbol names and addresses you've leaked from the GOT and pressing search will list all the libcs they're aware of that have symbols at those addresses. Unfortunately, the database isn't terribly extensive, so we need to try another solution.
Pwntools has a neat trick for identifying different libcs.
Basically, start with any address in libc, round it down to a page boundary,
then go back page-by-page until you find a page with an ELF header.
Then, most executables and libraries have a section in them named .note.gnu.build-id
,
which contains a unique identifier for that build of that library, which can be viewed with readelf -n
.
In particular, most build IDs appear at common places in the binary:
Pwntools checks at 0x180 and 0x27C.
There you should find the three byte string GNU
followed by a NUL byte,
then 20 bytes with the data of the build ID.
With this build ID, we can then go to a much more extensive repository of libcs on GitLab.
The hashes/build_id
directory contains symbolic links to the libcs indexed by build ID.
If your build ID is in there, you can go grab the libc, grab out your symbols, and be on your way!
It's very convenient.
The build ID I extracted was CA5C6CFE528AF541C3C2C15CEE4B3C74DA4E2FB4,
which is in the repository!… but is a broken symbolic link.
It gives us a hint, though, of where the libc came from—a daily Ubuntu Core snapshot.
Indeed, searching the web for the broken symbolic link,
we can see that this image was integrated into a Google Compute Engine image…
unfortunately, I couldn't figure out how to create a new virtual machine instance on Google Cloud with this image.
I couldn't find any other sources of this libc either, so I have to run blind.
We're not out of luck yet. The linker has to have some way to resolve symbols in libc, so we can use that same information to resolve symbols ourselves. It's just not as convenient. Pwntools even wraps this up into its own nice little module. Unfortunately, hooking Pwntools up, it expects to be able to leak a lot of memory, but leaking memory with our improvised method is slow and runs into the timeout very quickly.
I might have been able to coerce Pwntools to do what I wanted, but I figured I'd take this opportunity to figure out what it was doing under the hood anyway. In order to resolve a symbol, we have to:
PT_DYNAMIC
entry
DT_SYMTAB
),
string table (DT_STRTAB
), and GNU hash table (DT_GNU_HASH
)
It's a lot of steps which translate to a fair amount of code, but it's relatively straightforward.
Better, all of these offsets stay constant between executions.
Pwntools converts them into absolute addresses,
which is bad for us because ASLR will change the addresses between executions and the timeout means we can't do it all in one go;
but keeping them as offsets, on the other hand, we can restart the connection whenever it times out,
and continue to leak more wherever we left off.
After implementing this, we can get the address of system
.
We are dealing with different libcs, but they were compiled from a similar code base.
It's not like glibc is completely rewritten from scratch every version,
and compilers, given similar source code, do occasionally produce similar binaries.
One instance of POP RDI; RET
I found was in the iconv
function.
Resolving the address of the iconv
function in the remote libc
and looking at the same offset within the function that the gadget appeared in our local libc,
our POP RDI; RET
gadget exists in the remote iconv
as well, off by just a few bytes.
Now we have system
and POP RDI; RET
.
We just need /bin/sh
.
Fortunately, this doesn't need to be executable,
so we don't actually need to find it in libc;
we can just stick it right after our ROP chain.
Updated libc offsets in hand, we can re-run our exploit, and finally, remote 🐚!
cat flag.txt
and you're golden.
Oh, and snag the libc for your personal database. 😉