CVE-2018-4259: MacOS NFS vulnerabilties lead to kernel RCE

This post is about several stack and heap buffer overflows that I found in Apple’s macOS operating system kernel. Apple classified these vulnerabilities as remote code execution vulnerabilities in the kernel, so they were extremely serious. An attacker could potentially have used them to remotely hack into a Mac. Or, with physical access to the computer, an attacker could have logged in as the guest user (no password required) and then used the vulnerabilities to elevate their privileges and take control of the computer.

The vulnerabilities were in the client-side NFS implementation, which is used for mounting a network drive, like a NAS, into the Mac’s file system.

Severity and Mitigation

The vulnerabilities were fixed in macOS version 10.13.6, which Apple released on July 9, 2018. Apple asked me not to disclose the vulnerabilities until further notice. This is what they said:

While these issues were addressed with macOS High Sierra 10.13.6, we are investigating addressing these issues on additional platforms. We ask that you please refrain from disclosing these issues until we have concluded our investigation. We will let you know once we have concluded our investigation and updated the security advisories to include your information.

NFS is very widely used, particularly in larger corporations, where it is used for shared drives and networked home directories. But it is also used in home NAS devices, which are often used as media servers. In macOS, no special permissions are required to mount an NFS share, so the vulnerabilities can be triggered by any user, even the guest account (which doesn’t require a password). Furthermore, many computers, particularly in a corporate environment, are configured to automatically mount an NFS share when they start up. This means that there were at least two attack vectors for these vulnerabilities:

The vulnerabilities could have been used to quickly spread a virus throughout a corporation that uses NFS filers. To do that, the attacker would have needed to either install a malicious version of the NFS server software on the filer, or find a way to put a rogue filer on the network and start intercepting some of the NFS traffic.
The vulnerabilities could be used to gain escalated privileges. For example, someone could have logged in as the guest user, then issued a one-line command to connect to a rogue NFS server (which could be anywhere on the internet), and get kernel-level access on the machine.

Apple assigned five CVEs because the code contained multiple similar bugs: CVE-2018-4259, CVE-2018-4286, CVE-2018-4287, CVE-2018-4288, and CVE-2018-4291. In the vulnerability report that I sent to Apple on May 21, 2018, I listed 14 separate locations in the source code that I considered to be bugs. I only sent working exploits for two of them, so I am not completely sure which lines of code those five CVEs correspond to. In fact, Apple only very recently published the source code for macOS version 10.13.6, so I have not yet finished auditing all of the source code changes. (The source code for macOS versions 10.13.4, 10.13.5, and 10.13.6 was released on Oct 3, 2018.) Therefore, to avoid accidentally disclosing any bugs that might not have been fixed, in this post I will only talk about the two bugs that I sent Apple working proof-of-concept exploits for.

Proof-of-concept exploit

I wrote a proof-of-concept exploit, which overwrote 4096 bytes of heap memory with zeros and caused the kernel to crash. I made a short video to demonstrate this. 4096 was an arbitrary choice—I could have changed the exploit to send as much or as little data as I liked. Any number greater than 128 bytes would trigger a heap buffer overflow. I also had complete control over the values of the bytes that were written. So, although my PoC only crashed the kernel, it is reasonable to assume that it is possible to use these buffer overflows to achieve remote code execution and local privilege escalation.

When I first found the vulnerabilities, the idea that I would have to write my own NFS server to create a PoC seemed rather daunting. But after I had learned a bit about the NFS protocol, and how to use rpcgen, it turned out to be surprisingly simple. My exploit PoC consisted of just 46 lines of C and 63 lines of RPC language. I will not release the exploit PoC immediately, because I want to give Apple users a chance to upgrade their devices first. However, in the relatively near future I will publish the source code for the exploit PoC in our SecurityExploits repository.

The vulnerabilities

The two vulnerabilities that I wrote PoCs for were in this innocuous looking line of code (nfs_vfsops.c:4151):

nfsm_chain_get_fh(error, &nmrep, nfsvers, fh);

The purpose of this line of code is to read a file handle (fh) from a reply message (nmrep) that was sent back to the Mac by the NFS server. A file handle is an opaque identifier for a file or directory on the NFS share. File handles are at most 64 bytes in NFSv3 or 128 bytes in NFSv4 (search for FHSIZE). The fhandle_t type in XNU has enough space for a 128 byte file handle, but they forgot to check for buffer overflows in the nfsm_chain_get_fh macro:

/* get the size of and data for a file handle in an mbuf chain */
#define nfsm_chain_get_fh(E, NMC, VERS, FHP) \
  do { \
    if ((VERS) != NFS_VER2) \
      nfsm_chain_get_32((E), (NMC), (FHP)->fh_len); \
    else \
      (FHP)->fh_len = NFSX_V2FH;\
    nfsm_chain_get_opaque((E), (NMC), (uint32_t)(FHP)->fh_len, (FHP)->fh_data);\
    if (E) \
      (FHP)->fh_len = 0;\
  } while (0)

This code is rather hard to follow due to the heavy use of macros, but what it does is actually very simple: it reads a 32-bit unsigned integer from the message into (FHP)->fh_len, and then reads that number of bytes from the message directly into (FHP)->fh_data. There is no bounds check, so an attacker could overwrite an arbitrary amount of kernel heap with any sequence of bytes that they choose. The memory for the file handle that gets overwritten is allocated at nfs_socket.c:1401.

The second bug that I wrote a PoC for is an integer overflow in the nfsm_chain_get_opaque macro:

/* copy the next consecutive bytes of opaque data from an mbuf chain */
#define nfsm_chain_get_opaque(E, NMC, LEN, PTR) \
  do { \
    uint32_t rndlen; \
    if (E) break; \
    rndlen = nfsm_rndup(LEN); \
    if ((NMC)->nmc_left >= rndlen) { \
      u_char *__tmpptr = (u_char*)(NMC)->nmc_ptr; \
      (NMC)->nmc_left -= rndlen; \
      (NMC)->nmc_ptr += rndlen; \
      bcopy(__tmpptr, (PTR), (LEN)); \
    } else { \
      (E) = nfsm_chain_get_opaque_f((NMC), (LEN), (u_char*)(PTR)); \
    } \
  } while (0)

This code uses the nfsm_rndup macro to round LEN up to the next multiple of 4. But it uses the original value of LEN in the call to bcopy. If the value of LEN is 0xFFFFFFFF then the addition in nfsm_rndup will overflow and the value of rndlen will be 0. This means that the comparison with (NMC)->nmc_left will succeed and bcopy will be called with 0xFFFFFFFF as the size argument. This will of course cause an immediate kernel crash, so it could only be used as a denial of service attack.

Finding the bugs with CodeQL

One of CodeQL’s great strengths is its ability to find variants of known bugs. Earlier this year, my colleague Jonas Jensen found two vulnerabilities, CVE-2018-4136 and CVE-2018-4160, in Apple’s NFS Diskless Boot implementation. I published a blog post about those vulnerabilities and the query that found them. That query was designed to look for calls to bcopy that might have a user-controlled size argument that might be negative. A simple variation is to look for calls to bcopy where the source buffer is user controlled. Such calls are potentially interesting, because they copy user-controlled data into kernel space.

/**
 * @name bcopy of network data
 * @description Copying a variable-sized network buffer into kernel memory
 * @kind path-problem
 * @problem.severity warning
 * @id apple-xnu/cpp/bcopy-negative-size
 */

import cpp
import semmle.code.cpp.dataflow.DataFlow
import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis
import DataFlow::PathGraph

class MyCfg extends DataFlow::Configuration {
  MyCfg() {
    this = "MyCfg"
  }

  override predicate isSink(DataFlow::Node sink) {
    exists (FunctionCall call
    | sink.asExpr() = call.getArgument(1) and
      call.getTarget().getName() = "__builtin___memmove_chk" and
      not call.getArgument(2).isConstant())
  }

  override predicate isSource(DataFlow::Node source) {
    source.asExpr().(FunctionCall).getTarget().getName() = "mbuf_data"
  }
}

from DataFlow::PathNode sink, DataFlow::PathNode source, MyCfg dataFlow
where dataFlow.hasFlowPath(source, sink)
select sink, source, sink, "bcopy of network data"

The above query is rather simplistic, because it looks for any call to bcopy that copies data from an mbuf into kernel space. There’s nothing wrong with such calls, as long as the bounds of the size parameter are properly checked. However it turns out that a significant proportion of the results are uses of the nfsm_chain_get_fh macro, which does not do any bounds checking. So, despite the simplicity of this query, it finds a number of important security vulnerabilities. In its current form, the query will continue to report the same results even after the bugs are fixed. It would be nice to improve the query so that it won’t report a result when there is a proper bounds check in place.

The usual way to implement a bounds check is something like this:

if (n < limit) {
  bcopy(src, dst, n);
}

I wrote this predicate to detect the above pattern:

/**
 * Holds if `guard` is a bounds check which ensures that `size` is less than
 * `limit`. For example:
 * 
 *   if (size < limit) {
 *     ... size ...
 *   }
 */
predicate guardedSize(GuardCondition guard, Expr size, Expr limit,
                      RelationStrictness strict) {
  exists (boolean branch, Expr sz, BasicBlock block
  | guard.controls(block, branch) and
    block.contains(size) and
    globalValueNumber(size) = globalValueNumber(sz) and
    relOpWithSwapAndNegate(guard, sz, limit, Lesser(), strict, branch))
}

It uses the Guards library to find size expressions that are used in a control flow location that is controlled by guard. It then uses the globalValueNumber library to check that the same size expression occurs in the condition itself. The GlobalValueNumbering library enables the predicate to detect the equality of non-trivial size expressions like this:

if (packet.data.size < limit) {
  ... packet.data.size ...
}

Finally, it uses a utility named relOpWithSwapAndNegate to check that the size expression is less than the limit. It enables the predicate to also handle scenarios like this:

if (packet.data.size >= limit) {
  return -1;
} else {
  ... packet.data.size ...
}

The other way that bounds checks are sometimes implemented is by calling min, as detected by this predicate:

/**
 * Holds if `size` is bounds checked with a call to `min`:
 * 
 *    size = min(n, limit);
 *
 *    ... size ...
 */
predicate minSize(Expr size) {
  exists (DataFlow::Node source, DataFlow::Node sink
  | DataFlow::localFlow(source, sink) and
    source.asExpr().(FunctionCall).getTarget().getName() = "min" and
    size = sink.asExpr())
}

I combined these two predicates as follows:

/**
 * Holds if `size` has been bounds checked.
 */
predicate checkedSize(Expr size) {
  lowerBound(size) >= 0 and
  (guardedSize(_, size, _, _) or minSize(size))
}

Note that I have also used lowerBound to make sure that there is no chance of a negative integer overflow. The only remaining thing to do is to use checkedSize in the isSink method, to reduce the number of false positives. This is the finished query:

/**
 * @name bcopy of network data
 * @description Copying a variable-sized network buffer into kernel memory
 * @kind path-problem
 * @problem.severity warning
 * @id apple-xnu/cpp/bcopy-negative-size
 */

import cpp
import semmle.code.cpp.valuenumbering.GlobalValueNumbering
import semmle.code.cpp.controlflow.Guards
import semmle.code.cpp.dataflow.DataFlow
import semmle.code.cpp.dataflow.TaintTracking
import semmle.code.cpp.rangeanalysis.RangeAnalysisUtils
import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis
import DataFlow::PathGraph

/**
 * Holds if `guard` is a bounds check which ensures that `size` is less than
 * `limit`. For example:
 * 
 *   if (size < limit) {
 *     ... size ...
 *   }
 */
predicate guardedSize(GuardCondition guard, Expr size, Expr limit,
                      RelationStrictness strict) {
  exists (boolean branch, Expr sz, BasicBlock block
  | guard.controls(block, branch) and
    block.contains(size) and
    globalValueNumber(size) = globalValueNumber(sz) and
    relOpWithSwapAndNegate(guard, sz, limit, Lesser(), strict, branch))
}

/**
 * Holds if `size` is bounds checked with a call to `min`:
 * 
 *    size = min(n, limit);
 *
 *    ... size ...
 */
predicate minSize(Expr size) {
  exists (DataFlow::Node source, DataFlow::Node sink
  | DataFlow::localFlow(source, sink) and
    source.asExpr().(FunctionCall).getTarget().getName() = "min" and
    size = sink.asExpr())
}

/**
 * Holds if `size` has been bounds checked.
 */
predicate checkedSize(Expr size) {
  lowerBound(size) >= 0 and
  (guardedSize(_, size, _, _) or minSize(size))
}

class MyCfg extends DataFlow::Configuration {
  MyCfg() {
    this = "MyCfg"
  }

  override predicate isSink(DataFlow::Node sink) {
    exists (FunctionCall call
    | sink.asExpr() = call.getArgument(1) and
      call.getTarget().getName() = "__builtin___memmove_chk" and
      not checkedSize(call.getArgument(2)))
  }

  override predicate isSource(DataFlow::Node source) {
    source.asExpr().(FunctionCall).getTarget().getName() = "mbuf_data"
  }
}

from DataFlow::PathNode sink, DataFlow::PathNode source, MyCfg dataFlow
where dataFlow.hasFlowPath(source, sink)
select sink, source, sink, "bcopy of network data"

Try CodeQL on XNU

Unlike most other open source projects, XNU is not available to query on LGTM. This is because LGTM uses Linux workers to build projects, but XNU can only be built on a Mac. Even on a Mac, XNU is highly non-trivial to build. I would not have been able to do it if I had not found this incredibly useful blog post by Jeremy Andrus. Using Jeremy Andrus’s instructions and scripts, I have manually built snapshots for the three most recent published versions of XNU. The versions required are 10.13.4, 10.13.5 and 10.13.6. Unfortunately, at the time of writing, Apple have not yet released the source code for 10.14 (Mojave / iOS 12). To run queries on those versions, you would need to first create the corresponding CodeQL databases by using CodeQL CLI, and then install the CodeQL extension for VSCode or Eclipse to run queries.

[EDIT]: You can also use our free CodeQL extension for Visual Studio Code. See installation instructions at https://securitylab.github.com/tools/codeql/.

Timeline

2018-05-21: Privately disclosed to Apple. Proof-of-concept exploit included.
2018-05-22: Report acknowledged by product-security@apple.com.
2018-07-09: Notified by Apple that they needed to investigate addressing these issues on additional platforms. They asked me not to disclose the vulnerabilities until further notice.
2018-07-09: macOS version 10.13.6 released by Apple. The vulnerabilities were fixed.
2018-09-13: Contacted product-security@apple.com to ask if the vulnerabilities would be disclosed when macOS Mojave was released.
2018-09-13: Notified by Apple that the vulnerabilities would not be disclosed until November 2018.
2018-10-30: Vulnerabilities disclosed.

"What does this handle do?" "KEVFS"

Credits

“Sanitary Sewer Overflow Reduction Services”. EEC Environmental.
“What does this handle do?” By Edward Backhouse.
“KEVFS - macOS Edition”. By Jemima Backhouse.

Note: Post originally published on LGTM.com on October 30, 2018