Inst Prof

Write-up author
Vanilla (Batman's Kitchen)

Please help test our new compiler micro-service


Starting out: we use file and ls to see that our binary is a relatively small 64-bit ELF. tells us it has the standard protections plus PIE (NX is standard, of course). Stick it in Radare and examine it: we see that after greeting us, it goes into an infinite loop, which:

  1. Allocates a page of memory
  2. Copies a template into that page
  3. Reads four bytes from us and sticks it in a predefined spot in the template
  4. Makes the page executable
  5. Calls the page, measuring (using RDTSC) how many cycles it takes to execute
  6. Writes that number of cycles back to us
  7. Frees the page it allocated

Furthermore, what the template does is surround our code with a loop that runs it 4096 times and then returns. So, inst prof stands for “instant profiler”! There's nothing really to exploit, per se, to get execution here, but we are restricted in that there's not much you can do in four bytes.

It's pretty clear to me that there's not much we're going to be able to do in a single prompting. We're going to need to progressively build up what we want to do through several invocations. The first order of business there is to find a register whose value is preserved through iterations. Compilers tend to use registers like EAX before they go on to “later” registers like R8, R9, etc., so I first checked R15, setting it with GDB in one iteration and checking that it was the same in the next one. R15 is indeed preserved, as is R14. I believe R13 was preserved as well but I didn't need to use that fact.

Controlling Registers

When we have a register preserved, what can we do? Well, we can zero it out: XOR R15, R15 is three bytes, so it fits. We can also set its lower byte: MOV R15B, imm is four bytes. Shifting by most amounts of bits is four bytes; this would normally be acceptable, but our four bytes are each executed 4096 times, so shifting by anything 4096 times will zero out our register, which is not useful. x86 encodes a shift by one bit differently, however, in a way such that SHL R15, 1 is only three bytes, which leaves us one byte for a RET to prematurely exit the loop and make sure it only executes once. Of course, we now have to do this many times to shift by more than one bit, but that is not hard. With these three instructions we can now place arbitrary immediate 64-bit values in R14 or R15.

With these, it's simple to construct a write-what-where primitive. Place an address in R14, a value in R15, and execute MOV [R14], R15. However, we still don't know where we can write, because of ASLR and PIE.


To bypass ASLR we need to leak an address. More generally, we need to leak anything at all. We don't have enough control over the program yet to make it call write, printf or any other function to output anything to us directly, so we'll have to go about it some other way… The program already does provide some output, so it's worthwhile to see if we can utilize that somehow. It outputs how many cycles it took to execute our loop, so if we want to use this to exfiltrate data, we'll need to manipulate how long it takes to execute the loop.

The first instinct is to test some bit and branch to the RET or continue the loop depending on the bit's value. Unfortunately that takes up too many bytes to represent, and we can't break it up over multiple executions because the status flags are clobbered. I also considered using an instruction like a multiply whose time to execute varies depending on the data, but that seemed fragile and was also pushing against our four-byte limit. Ultimately I decided to subtract a bit from the counter used by the loop, ECX. If the bit is zero, it performs all 4096 iterations. If the bit is one, the counter goes down by two each time rather than one, and the loop executes 2048 times. This uses roughly half as many cycles, which is a noticeable difference.

With this we can extract a single bit of information. To extract a whole QWORD, we can move the QWORD we want to exfiltrate into another register, mask it to a single bit and extract it, then shift the original register by one bit and repeat until we've got the whole QWORD.

Then it's easy to leak any register value, like RSP, for example (if we wanted it). I was more interested in defeating PIE, so I leaked the QWORD on the top of the stack, the return address back into the executable. I also leaked RBX, a pointer to the page that it allocated and was at that time executing. This becomes useful later. It's also straightforward to leak memory by storing an immediate address, dereferencing it (e.g. MOV R15, [R14]), and leaking that.

Taking Control, First Attempt

From here it's easy to get control of RIP. We can store an address in R15, for example, and then execute JMP R15. Or we could use our write-what-where primitive to overwrite the GOT. Or we could leak RSP and overwrite the return address. But back to our objective: we want a shell. We already have handy functions for allocating new writable pages and making them executable, so it seems like using shellcode is feasible here if we wanted to. The executable is linked against libc, so we could also use a ret2libc into system.

The problem with both of these approaches is that we need to control the argument being passed, either into the make_page_executable function or the system function. Since we're on 64-bit, we need to modify RDI. Unfortunately it's too many bytes to modify RDI and call the function at the same time, and RDI gets clobbered (severely!) between iterations so splitting it into two executions won't work either. Since we do control the allocation of a page and can write to it and know its address, and have the ability to change RSP if we want, we could construct a ROP chain and pivot to it. However, to change RDI, we need a POP RDI; RET gadget, and there's none in the executable. There are plenty in libc, but by doing that, we introduce a dependency on a specific libc.

I couldn't think of any other options, so I went forward with using libc.

Using libc

To use libc, we have to know where it is. Using our read primitive, we can read the GOT to figure out where any of the imported functions live in libc. We can then subtract and add offsets within libc to find other functions or gadgets in libc. You can find the offsets of exported functions with readelf -s, offsets of gadgets with ROPgadget or rp++, and offsets of strings with rabin2 -z. Since I'm depending on libc anyway, I might as well forego shellcode and just do ROP with ret2libc, so I extracted system, /bin/sh, and my POP RDI; RET gadget.

Putting It All Together, First Attempt

So all together, we:

  1. Leak the return address to find the executable base
  2. Leak the RBX to find where we're being executed
  3. Leak an address in the GOT to find libc
  4. Fabricate a call to alloc_page and infer its location (it'll be right before the page we're executing from)
  5. Assemble a ROP chain in our new page that:
    1. Pops the address of the /bin/sh string into RDI
    2. Hops into system
    3. Hops into _exit (I like being clean but not enough to set RDI before calling _exit)
  6. Pivot RSP onto our ROP chain
  7. Await the shell

Run it locally, et voilà, 🐚! Unfortunately we're unlikely to have the same version of libc as is running remotely, so we have a little more work to do to make this work there. You'll be able to tell if this is the case if you leak an address in the GOT, subtract the offset, and the result is not page-aligned.

Identifying libc

While dated, the easiest way to identify a libc is to use Entering two symbol names and addresses you've leaked from the GOT and pressing search will list all the libcs they're aware of that have symbols at those addresses. Unfortunately, the database isn't terribly extensive, so we need to try another solution.

Pwntools has a neat trick for identifying different libcs. Basically, start with any address in libc, round it down to a page boundary, then go back page-by-page until you find a page with an ELF header. Then, most executables and libraries have a section in them named, which contains a unique identifier for that build of that library, which can be viewed with readelf -n. In particular, most build IDs appear at common places in the binary: Pwntools checks at 0x180 and 0x27C. There you should find the three byte string GNU followed by a NUL byte, then 20 bytes with the data of the build ID.

With this build ID, we can then go to a much more extensive repository of libcs on GitLab. The hashes/build_id directory contains symbolic links to the libcs indexed by build ID. If your build ID is in there, you can go grab the libc, grab out your symbols, and be on your way! It's very convenient. The build ID I extracted was CA5C6CFE528AF541C3C2C15CEE4B3C74DA4E2FB4, which is in the repository!… but is a broken symbolic link. It gives us a hint, though, of where the libc came from—a daily Ubuntu Core snapshot. Indeed, searching the web for the broken symbolic link, we can see that this image was integrated into a Google Compute Engine image… unfortunately, I couldn't figure out how to create a new virtual machine instance on Google Cloud with this image. I couldn't find any other sources of this libc either, so I have to run blind.

Exploiting an Unknown libc

We're not out of luck yet. The linker has to have some way to resolve symbols in libc, so we can use that same information to resolve symbols ourselves. It's just not as convenient. Pwntools even wraps this up into its own nice little module. Unfortunately, hooking Pwntools up, it expects to be able to leak a lot of memory, but leaking memory with our improvised method is slow and runs into the timeout very quickly.

I might have been able to coerce Pwntools to do what I wanted, but I figured I'd take this opportunity to figure out what it was doing under the hood anyway. In order to resolve a symbol, we have to:

  1. Find the base address of libc
  2. Look in the ELF header for the offset of the program header table
  3. Search through the program header table for the PT_DYNAMIC entry
  4. Search through the dynamic table for the offsets to the symbol table (DT_SYMTAB), string table (DT_STRTAB), and GNU hash table (DT_GNU_HASH)
  5. Read the parameters of the GNU hash table
  6. Hash our symbol name and look in the hash table to find a search index
  7. Probe consecutive entries of the symbol table until we find the symbol we're looking for
  8. Return the symbol's value

It's a lot of steps which translate to a fair amount of code, but it's relatively straightforward. Better, all of these offsets stay constant between executions. Pwntools converts them into absolute addresses, which is bad for us because ASLR will change the addresses between executions and the timeout means we can't do it all in one go; but keeping them as offsets, on the other hand, we can restart the connection whenever it times out, and continue to leak more wherever we left off. After implementing this, we can get the address of system.

ROP Gadgets in an Unknown libc

We are dealing with different libcs, but they were compiled from a similar code base. It's not like glibc is completely rewritten from scratch every version, and compilers, given similar source code, do occasionally produce similar binaries. One instance of POP RDI; RET I found was in the iconv function. Resolving the address of the iconv function in the remote libc and looking at the same offset within the function that the gadget appeared in our local libc, our POP RDI; RET gadget exists in the remote iconv as well, off by just a few bytes.

Now we have system and POP RDI; RET. We just need /bin/sh. Fortunately, this doesn't need to be executable, so we don't actually need to find it in libc; we can just stick it right after our ROP chain.

Putting It All Together, Second Attempt

Updated libc offsets in hand, we can re-run our exploit, and finally, remote 🐚! cat flag.txt and you're golden. Oh, and snag the libc for your personal database. 😉