Last orders at the House of Force

I recently reported several vunerabilities in SANE — an open-source library for interfacing with document scanners. It is used by the Simple Scan application, which ships by default with Ubuntu Desktop. One of the vulnerabilities, CVE-2020-12861 (aka GHSL-2020-080), is a remotely triggerable heap buffer overflow. I thought it would be a great prank to play on my colleagues at the GitHub UK office in Oxford if I could use it to pop a calculator on their desktops. Like me, many of my colleagues in Oxford run Ubuntu. I doubt that many of them use Simple Scan on a regular basis, but that’s easily solved with little bit of social engineering. I planned to post a message like this on our internal #oxford channel:

Have any of you managed to get Ubuntu’s scanning application to work with the printer on the 2nd floor?

My colleagues are a helpful bunch, so I bet several of them would have immediately opened Simple Scan to try it out. Mwahaha! Sadly, my plan was thwarted by COVID-19, because we are all working from home now. But it was an interesting exploitation challenge, regardless. Ultimately, I was successful, as you can see in this video. Just starting Simple Scan is sufficient to trigger the vulnerability, because SANE Backends automatically searches the local network for scanners. All I need to do is connect my laptop to the office network and run a small server that pretends to be a network-attached scanner, then wait for my colleagues to start Simple Scan.

ASLR tries to ruin the prank

Address space layout randomization (ASLR) greatly increases the difficulty of exploiting a memory corruption vulnerability. To achieve code execution, you usually need to be able to forge code and data pointers, which means that you need to know the ASLR offsets. One way to deduce the ASLR offsets is with an infoleak vulnerability. I looked hard for a remote infoleak in SANE, but the only one that I found, CVE-2020-12863 (aka GHSL-2020-083), is useless in practice because it only works when the next byte on the stack is a valid ASCII digit. And even if it did work, it would leak very little useful information. For my prank to be viable, any amount of brute force is also out of the question. If my exploit doesn’t guess the ASLR offsets correctly on the first try, then Simple Scan will crash, which is “game over” for the prank.

My only hope is a local infoleak: CVE-2020-12862 (aka GHSL-2020-082). The function decode_binary has an out-of-bounds read:

/* h000 */
static char *decode_binary(char *buf)
{
  char tmp[6];
  int hl;

  memcpy(tmp, buf, 4);
  tmp[4] = '\0';

  if (buf[0] != 'h')
    return NULL;

  hl = strtol(tmp + 1, NULL, 16);
  if (hl) {

    char *v = malloc(hl + 1);
    memcpy(v, buf + 4, hl);
    v[hl] = '\0';

    return v;
  }

  return NULL;
}

I control the contents of buf, so I also control the value of hl, which can be any value between 0 and 0xFFF (4095). So the memcpy can read beyond the end of buf. Since buf is a stack buffer (rbuf in esci2_cmd), I can use this to copy up to 4095 bytes of stack data into a newly allocated heap buffer. In other words, I can use this vulnerability to copy all kinds of juicy information, such as stack pointers, heap pointers, and code pointers, into a heap buffer. The only problem is that all of that data is on the wrong computer.

A local infoleak can be just as good as a remote infoleak if the victim application contains a scripting engine. For example, browser exploits usually involve a malicious JavaScript program that runs in the victim’s browser. There’s no need to transmit the leaked pointers back to the attacker’s computer if you can process them on-site. SANE does not contain a scripting engine though.

Luckily, I don’t need a full scripting engine. All I really need is a single subtraction operation. I’ll explain why in more detail later, but the short reason is that the whole exploit hinges on being able to overwrite a specific stack address with a large number. The local infoleak gives me a stack pointer, which is always exactly 0x218 higher in memory than the stack address which I want to overwrite. So I need to subtract 0x218 from the pointer. If I can find a way to do that then the rest of the exploit will fall into place.

To compute the subtraction, I’ll use a variation on a classic heap exploitation technique known as the “House of Force.”

The Malloc Maleficarum

The Malloc Maleficarum is a legendary document from 2005 about glibc heap exploitation. It introduced a number of new exploitation techniques with names like the “House of Lore” and the “House of Spirit.” Almost 15 years later, improved sanity checks in glibc’s malloc implementation have closed the door on several of the houses. For example, the House of Lore is closed since glibc version 2.26. Soon, it will also be time to say farewell to the House of Force, which is shut down by improved sanity checking in glibc 2.28. But the House of Force isn’t out of business yet: Ubuntu 18.04 LTS is one of the most widely used Linux distributions, and it ships with glibc 2.27.

A fantastic resource for learning about glibc heap exploitation is the how2heap repository, by the Shellphish team. It covers a large selection of techniques, including the Malloc Maleficarum techniques that still work. It also includes a convenient test harness for running the examples on different glibc versions. I particularly like how each technique is documented as a runnable C program, so that you can step through the code in gdb and see exactly what is happening.

I have published the complete source code for my exploit. Warning: it’s ugly, and to fully understand what it’s doing you need to simultaneously debug the exploit source code and the SANE source code. Instead, I thought it would be more useful to present the important parts of the exploit as standalone how2heap-style C programs. You can download the demos and run them yourself:

git clone https://github.com/github/securitylab
cd securitylab/SecurityExploits/SANE/epsonds_CVE-2020-12861/glibc_heap_exploit_demos/home
make
./01_chunk_layout  # run the first example

The demos are designed to work with glibc 2.27, so if your system has a different glibc version then I recommend running the demos in Docker. I have included a Dockerfile based on a Ubuntu 18.04 LTS image for this purpose:

git clone https://github.com/github/securitylab
cd securitylab/SecurityExploits/SANE/epsonds_CVE-2020-12861/glibc_heap_exploit_demos/
sudo docker build . -t glibc-heap-exploit-demos --build-arg UID=`id -u`
sudo docker run --rm -i -t glibc-heap-exploit-demos
# We're in a container now.
make
./01_chunk_layout  # run the first example

I will present these demos first, and at the end of the blog post, will explain how I combined the techniques to create a working exploit for the SANE vulnerabilities.

The demos only focus on the parts of the glibc allocator that are relevant to the exploit. If you would like to read a more general overview of how the glibc allocator works, I recommend reading Azeria’s two-part series.

Demo 1: glibc memory chunk layout

01_chunk_layout.c is a very simple demo to illustrate the “chunk” data structure used by the glibc malloc implementation. Here’s a diagram of what chunks look like in memory:¹

Notice that the blocks of user data are separated by metadata blocks of size 0x10 bytes. The demo allocates four chunks, frees one of them, and then prints the metadata, which looks like this:

Chunk 0:
0x55d899c2c250:  0000000000000000 0000000000000811  (metadata)
0x55d899c2c260:  0000000000000000 0000000000000000  (user data)
0x55d899c2c270:  0000000000000000 0000000000000000  (user data)

Chunk 1:
0x55d899c2ca60:  0000000000000000 0000000000000811  (metadata)
0x55d899c2ca70:  0000000000000000 0000000000000000  (user data)
0x55d899c2ca80:  0000000000000000 0000000000000000  (user data)

Chunk 2 (free):
0x55d899c2d270:  0000000000000000 0000000000000811  (metadata)
0x55d899c2d280:  00007f4021262ca0 00007f4021262ca0  (user data)
0x55d899c2d290:  0000000000000000 0000000000000000  (user data)

Chunk 3:
0x55d899c2da80:  0000000000000810 0000000000000810  (metadata)
0x55d899c2da90:  0000000000000000 0000000000000000  (user data)
0x55d899c2daa0:  0000000000000000 0000000000000000  (user data)

The vulnerability in SANE is a heap buffer overflow, so it enables me to overwrite the next chunk in memory. There are two particularly important observations to be made about that:

The chunk metadata contains only size information. Since it does not contain any pointers, I can exploit the metadata before I have obtained the ASLR offsets.
The user data section is used to store metadata after the chunk has been freed. Therefore, I can only safely overwrite the user data section of the next chunk if:
- It is currently allocated, or
- I know the ASLR offsets.

In other words, a heap buffer overflow gives me a lot of power to manipulate the heap without needing to know the ASLR offsets. And once I have deduced the ASLR offsets, I can start overwriting the pointers in freed chunks and thereby gain access to other areas of memory, such as the stack.

Demo 2: arithmetic with top

The top chunk is the sentinel chunk at the top of the heap. It has metadata, but its user data section is unused. When the allocator has run out of free chunks, it allocates memory from the top chunk. This means that the top chunk shrinks and moves to a higher memory location:

The top chunk is usually moderately sized (0x10000 bytes or so). If there isn’t enough memory left in the top chunk to service an allocation then the allocator uses a system call to request more memory from the OS. The concept of the House of Force is to overwrite the size of the top chunk with a very large number, so that the allocator thinks that it can allocate a huge chunk without needing to request more memory from the OS. The how2heap repository has a nice demo of the House of Force.

I already mentioned that my exploit uses a variation on the House of Force to compute a subtraction. The idea is to overwrite the size of the top chunk with a pointer. Then, by allocating another chunk, I can subtract an offset from the pointer. 02_arithmetic_with_top.c is a simple demo of this idea. As mentioned in the demo, there is a slight issue because the least significant three bits of the chunk size contain important metadata, which causes the bottom byte of the pointer to get modified in an inconvenient way. In the exploit, I avoid this by writing the stack address one byte higher in memory, which has the effect of multiplying the number by 0x100 (because the machine is little-endian). The offset that I need to subtract from the pointer for the exploit to work is 0x218, so I actually need to allocate 0x21800 bytes to get the correct result.

Demo 3: overlapping a chunk with top

Did I just create a chicken-and-egg problem? If I am able to overwrite the size of the top chunk with a pointer, doesn’t that imply that I have already defeated ASLR? If so, what’s the point of the elaborate exercise to subtract an offset from the pointer? The answer is that I don’t use the buffer overflow directly, instead I use it indirectly to create a chunk that overlaps with the top chunk. Later, that chunk is allocated and a pointer is written into it, overwriting the size of the top chunk. Here’s a diagram illustrating the idea:

In the first part of the diagram, two chunks have been allocated adjacent to the top chunk. A heap buffer overflow on the first chunk overwrites the metadata of the second, making it look bigger. When the second chunk is freed, it is returned with a larger size than when it was allocated. Later, an allocation with the larger size returns a pointer to this chunk which overlaps with the top chunk. In the SANE exploit, I have arranged things so that the call to malloc in decode_binary (see above) returns a chunk that overlaps with the top chunk, which is how I am able to overwrite the size of the top chunk with a stack pointer.

The demo is 03_overlap_top_chunk.c.

Demo 4: fastbin reverse into tcache

Using the variation on the House of Force just described, I can calculate a stack address that I want to overwrite. The next step is to overwrite that address. The how2heap repository contains a technique called unsorted bin attack, which does exactly that. A call to malloc causes a large value (a pointer) to be written to a stack address. Unfortunately, I discovered that the unsorted bin attack isn’t suitable for my exploit, because I need to trigger the write with an 0x40 byte allocation. The unsorted bin attack only works with an 0x410 byte allocation, or larger. Luckily, I was able to find a different technique that works for allocations up to 0x78 bytes. I named this technique “fastbin reverse into tcache” and submitted a pull request, which has been merged into the how2heap repository. I also asked in the pull request whether it was already a known technique. Kyle Zeng, one of the maintainers of the how2heap repository, says it was first made public in the 2019 HITCON CTF.

The tcache and fastbins are both caches for small-sized chunks, indexed by chunk size, so that a request for a specific chunk size can be serviced very quickly. I am not entirely sure why the allocator needs two caches. The tcache was added to the codebase more recently than the fastbins and appears, in my opinion, to have made the fastbins mostly redundant. The only limitation that the tcache has, which the fastbins don’t, is that the tcache has a maximum capacity of seven chunks per size. So if you have allocated a large number of chunks, all of the same size, and then free them all in rapid succession, then the first seven chunks go into the tcache and the rest go into the fastbins.

If the tcache is empty, then the allocator tries to allocate from the fastbin instead. But, as the comment in the source code says, it doesn’t just allocate one chunk from the fastbin: it also refills the tcache. Since the tcache has a maximum capacity of seven chunks, this means that seven chunks are taken from the fastbin and moved to the tcache. Since the chunks are stored as a singly linked list, a side effect of this is that the list is reversed. The “fastbin reverse into tcache” technique uses this list reversal behavior to overwrite a value on the stack, as depicted in this diagram:

The diagram starts with an empty tcache and seven chunks in the fastbin. A memory corruption vulnerability has been used to overwrite the forward pointer of the seventh chunk with a pointer to the stack. The stack is not a valid chunk, so its forward pointer is invalid. The second part of the diagram shows the situation after a chunk has been allocated. malloc has returned the head of the fastbin list. The rest of the list has been reversed and moved to the tcache. Unfortunately, the fastbin now points to the garbage pointer that was previously on the stack.

There are two versions of the demo. The first version, 04_A_fastbin_reverse_into_tcache.c, demonstrates the simpler scenario in which the value on the stack is zero. Because it looks like a NULL pointer, there is no risk of accidentally dereferencing an invalid pointer, so there is no need for the fastbin to contain exactly seven elements. It also means that the technique doesn’t leave the fastbin in an invalid state. The second version of the demo, 04_B_fastbin_reverse_into_tcache.c, demonstrates the scenario in which there is garbage on the stack. In this scenario, the technique leaves the fastbin in an invalid state. Allocations that can be served from the tcache still work fine, but any allocation that triggers a call to malloc_consolidate will cause a crash. In other words: no more large allocations. I found myself in this situation in the SANE exploit and it almost scuppered the whole thing. But I found a workaround, at the expense of having to trigger a codepath containing a call to sleep. It’s the reason why the exploit takes approximately 10 seconds to run.

This technique has another limitation: malloc_consolidate empties the fastbins. That means that no large allocations can happen while the technique is in progress. Both versions of the demo contain a commented out malloc(0x410) to show this problem: 04_A_fastbin_reverse_into_tcache.c, line 65 and 04_B_fastbin_reverse_into_tcache.c, line 67. Uncommenting those allocations breaks the demos. This is an obstacle that I need to overcome in the SANE exploit, because it means that I have to find a way to fill the fastbin during the final step of the exploit, rather than being able to set it up in advance.

Demo 5: shrinking a chunk while it’s in the tcache

The final demo is quite a simple technique, but I use it extensively in the SANE exploit to work around the limitation of the “fastbin reverse into tcache” technique which I just mentioned. I need a way to quickly free seven chunks so that they will be added to the fastbin. But all I have to work with is this code, which allocates some memory and then immediately frees it:

char *v = decode_string(value);
DBG(1, " version: %s\n", v);
free(v);

The good news is that I can send a command which triggers this code seven times in quick succession. The bad news is that it will keep reusing the same chunk and not add any new chunks to the fastbin. The trick is to corrupt some chunks in the tcache so that I can allocate them with one size, and then free them with a smaller size. This is simple to do with a buffer overflow that overwrites the metadata below the chunk:

I just need to make sure that the user data section of the chunk contains some fake metadata, so that it won’t trigger any run-time checks when it is freed.

The demo is 05_shrink_tcache_chunk.c.

SANE vulnerability overview and exploit plan

On startup, SANE Backends automatically searches for scanners. It seems that every manufacturer has designed their own unique protocol for communicating with their devices, so SANE Backends has to iterate through all the different brands (HP, Canon, etc.), trying each of their protocols in turn. The vulnerability that I am going to exploit is in the implementation of the EPSON protocol. The protocol starts with SANE Backends sending out a broadcast UDP message and listening for responses, in e2_network_discovery:

sanei_udp_write_broadcast(fd, 3289, (unsigned char *) query, 15);

DBG(5, "%s, sent discovery packet\n", __func__);

to.tv_sec = 1;
to.tv_usec = 0;

FD_ZERO(&rfds);
FD_SET(fd, &rfds);

sanei_udp_set_nonblock(fd, SANE_TRUE);
while (select(fd + 1, &rfds, NULL, NULL, &to) > 0) {
  if ((len = sanei_udp_recvfrom(fd, buf, 76, &ip)) == 76) {
    DBG(5, " response from %s\n", ip);

    /* minimal check, protocol unknown */
    if (strncmp((char *) buf, "EPSON", 5) == 0)
      attach_one_net(ip);
  }
}

So the first step of the exploit is to run a server, listening on UDP port 3289, which will respond to any incoming messages. Notice that the code above is designed to handle multiple replies. That’s crucial for the exploit, because it enables me to trigger the heap overflow an unlimited number of times. When my server receives the broadcast message, it fires off 128 replies to keep this loop running for a while.

Although it is extremely convenient that I can trigger the loop in e2_network_discovery an unlimited number of times, it also creates two hurdles. The first hurdle is just a minor inconvenience: an object of type struct epsonds_device (size 0xf8 bytes) is leaked on every iteration. If the top chunk is ever used to service that allocation then it will throw the exploit into chaos, so I need to make sure that I create lots of small gaps in the heap to absorb the leaks during the heap grooming phase of the exploit. The second hurdle is more serious: on every iteration, an object of type struct epsonds_scanner (size 0x820 bytes) is allocated (and also freed, thankfully). Since that’s a large allocation, it triggers a call to malloc_consolidate. Therefore, if I want to use the “fastbin reverse into tcache” technique, it has to happen within a single iteration.

After the initial UDP exchange, the protocol switches to TCP communication, which is where the bugs are. The TCP communication is handled by a pair of functions: epsonds_net_write and epsonds_net_read. First, epsonds_net_write sends a short message containing a command and an expected reply size. It also allocates a heap buffer for the reply. Then epsonds_net_read receives the reply:

/* receive net header */
size = epsonds_net_read_raw(s, header, 12, status);
if (size != 12) {
  return 0;
}

if (header[0] != 'I' || header[1] != 'S') {
  DBG(1, "header mismatch: %02X %02x\n", header[0], header[1]);
  *status = SANE_STATUS_IO_ERROR;
  return 0;
}

// incoming payload size
size = be32atoh(&header[6]);

DBG(23, "%s: wanted = %lu, available = %lu\n", __func__,
  (u_long) wanted, (u_long) size);

*status = SANE_STATUS_GOOD;

if (size == wanted) {

  DBG(15, "%s: full read\n", __func__);

  if (size) {
    read = epsonds_net_read_raw(s, buf, size, status);
  }

  if (s->netbuf) {
    free(s->netbuf);
    s->netbuf = NULL;
    s->netlen = 0;
  }

  if (read < 0) {
    return 0;
  }

} else if (wanted < size) {

  DBG(23, "%s: long tail\n", __func__);

  read = epsonds_net_read_raw(s, s->netbuf, size, status);  <===== no bounds check
  if (read != size) {
    return 0;
  }

  memcpy(buf, s->netbuf, wanted);
  read = wanted;

  free(s->netbuf);
  s->netbuf = NULL;
  s->netlen = 0;

} else {

  DBG(23, "%s: partial read\n", __func__);

  read = epsonds_net_read_raw(s, s->netbuf, size, status);
  if (read != size) {
    return 0;
  }

  s->netlen = size - wanted;
  s->netptr += wanted;
  read = wanted;

  DBG(23, "0,4 %02x %02x\n", s->netbuf[0], s->netbuf[4]);
  DBG(23, "storing %lu to buffer at %p, next read at %p, %lu bytes left\n",
    (u_long) size, s->netbuf, s->netptr, (u_long) s->netlen);

  memcpy(buf, s->netbuf, wanted);
}

return read;

I highlighted the line of code where the heap buffer overflow is. s->netbuf is the heap buffer (of size wanted) that was allocated during epsonds_net_write. I control the value of size, so I can overflow the buffer.

One of the things that I find fascinating about exploitation is that minor bugs, which wouldn’t normally be classified as security bugs, can sometimes be just as important for the exploit to work as the security bug. In this code, the minor bug that’s almost as important as the heap overflow is the memory leak that happens in the final else branch. Notice that there is no call to free(s->netbuf). My exploit uses this extensively for heap grooming.

Exploit plan

As I mentioned earlier, I couldn’t find a usable remote infoleak in SANE. Instead, I will use the heap buffer overflow to manufacture one. The function epsonds_net_write, which I mentioned above, sends commands to the scanner:

int
epsonds_net_write(epsonds_scanner *s, unsigned int cmd, const unsigned char *buf,
                  size_t buf_size, size_t reply_len, SANE_Status *status)
{
  unsigned char *h1, *h2;
  unsigned char *packet = malloc(12 + 8);

  /* XXX check allocation failure */

  h1 = packet;    // packet header
  h2 = packet + 12;  // data header

  if (reply_len) {
    s->netbuf = s->netptr = malloc(reply_len);
    s->netlen = reply_len;
    DBG(24, "allocated %lu bytes at %p\n",
      (u_long) reply_len, s->netbuf);
  }

  DBG(24, "%s: cmd = %04x, buf = %p, buf_size = %lu, reply_len = %lu\n",
    __func__, cmd, buf, (u_long) buf_size, (u_long) reply_len);

  memset(h1, 0x00, 12);
  memset(h2, 0x00, 8);

  h1[0] = 'I';
  h1[1] = 'S';

  h1[2] = cmd >> 8;  // packet type
  h1[3] = cmd;    // data type

  h1[4] = 0x00;
  h1[5] = 0x0C; // data offset

  DBG(24, "H1[0]: %02x %02x %02x %02x\n", h1[0], h1[1], h1[2], h1[3]);

  // 0x20 passthru
  // 0x21 job control

  if (buf_size) {
    htobe32a(&h1[6], buf_size);
  }

  if((cmd >> 8) == 0x20) {

    htobe32a(&h1[6], buf_size + 8);  // data size (data header + payload)

    htobe32a(&h2[0], buf_size);  // payload size
    htobe32a(&h2[4], reply_len);  // expected answer size

    DBG(24, "H1[6]: %02x %02x %02x %02x (%lu)\n", h1[6], h1[7], h1[8], h1[9], (u_long) (buf_size + 8));
    DBG(24, "H2[0]: %02x %02x %02x %02x (%lu)\n", h2[0], h2[1], h2[2], h2[3], (u_long) buf_size);
    DBG(24, "H2[4]: %02x %02x %02x %02x (%lu)\n", h2[4], h2[5], h2[6], h2[7], (u_long) reply_len);
  }

  if ((cmd >> 8) == 0x20 && (buf_size || reply_len)) {

    // send header + data header
    sanei_tcp_write(s->fd, packet, 12 + 8);

  } else {
    sanei_tcp_write(s->fd, packet, 12);
  }

  // send payload
  if (buf_size)
    sanei_tcp_write(s->fd, buf, buf_size);

  free(packet);

  *status = SANE_STATUS_GOOD;
  return buf_size;
}

I will overwrite the value of buf_size so that the final call to sanei_tcp_write sends far more bytes than it should. buf is a stack pointer so this will result in a remote infoleak containing all the information that I need to complete the exploit. How can I overwrite buf_size though? It is stored in a register, so the only opportunity that I have to alter its value is during one of the calls to malloc at the beginning of function, when it is temporarily saved to the stack. That is why I need to use the “fastbin reverse into tcache” technique: it enables me to overwrite buf_size during that malloc. I use the variant of the House of Force to calculate the stack address where buf_size will be saved.

These are the main steps of the exploit:

Groom the heap by deliberately leaking memory. There are three goals:
- Fill any large gaps so that any subsequent large allocations will come from the top chunk.
- Leave plenty of smaller gaps to absorb smaller memory leaks. In particular, the code is going to leak an object of type struct epsonds_device (size 0xf8 bytes) on every iteration.
- Empty the tcache for allocations of size 0x3d0, 0x3e0, 0x3f0, and 0x400. That’s because I want to allocate blocks of those sizes from the top chunk and then store them in the tcache to use later.
Prepare a magazine of chunks in the tcache using the “shrink a chunk while it’s in the tcache” technique. I use these chunks later in the “fastbin reverse into tcache” technique.
Allocate and free a large chunk from the top chunk. The purpose of this is to mmap enough memory to ensure that the subsequent top chunk shenanigans don’t accidentally hit unmapped memory and trigger a SIGSEGV.
Create a chunk that overlaps with the top chunk.
Use the variant of the House of Force to calculate the stack address where buf_size will be saved.
Trigger “fastbin reverse into tcache”.
Receive the stack dump and reply with a simple ROP chain.

The reason why I am able to immediately reply with a ROP chain in the final step is that the next heap allocation returns a stack pointer, due to the “fastbin reverse into tcache” technique. Therefore, I am able to send back a reply which overwrites the stack. I can do this reliably, because the stack dump has given me stack pointers, heap pointers, code pointers, and the value of the stack canary.

Farewell to the House of Force

The House of Force will be out of business soon. For example, Ubuntu 20.04 LTS ships with glibc 2.31, which includes a run-time check to block it. Does it mean that exploits like this will no longer be possible? I don’t think so. In hindsight, I could have designed the exploit to only overwrite the size of the top chunk with the bottom two bytes of the stack address, rather than the entire address, which would be a small enough number to avoid triggering the new run-time check. It would add a few more steps to the exploit because I would need to reconstruct the new address by concatenating bytes, but I think it would work.

I personally think it’s a little bit sad to see the glibc allocator — one of the most fundamental building blocks of all our software — getting gunked up with ever more run-time checks. Even if the run-time cost is very small, it’s still a price that we all pay. And it’s often futile. If an application has a bad heap corruption vulnerability, then run-time checks in the allocator are often just speed bumps on the road to exploitation. Ultimately, the only way to solve these issues is by eliminating the memory corruption vulnerabilities. Mitigations in the allocator won’t save us.

It’s slightly ambiguous where heap chunks begin and end, because the blocks of metadata include information about the chunks above and below. In my diagram, I have started each chunk from the beginning of the metadata, which is consistent with the alignment of the malloc_chunk datatype. However, it glosses over the fact that the bottom half of the metadata block really belongs with the chunk below rather than the chunk above, particularly because it is used to store user data while the chunk is allocated. Azeria, in her overview of the glibc heap implementation, has instead depicted the chunks as starting and ending in the middle of the metadata blocks. ↩