Apple XNU exploits: ICMP proof of concept

It is now several weeks since Apple disclosed six vulnerabilities that I found in their XNU operating system kernel. One vulnerability was in their ICMP packet-handling code, and the other five were in their client-side NFS implementation. I have now published the source code for the proof-of-concept exploits on GitHub. In this blog post, I will explain how the exploits work. I will also show how a query helped me to find a code path that would trigger the ICMP vulnerability.

ICMP packet-handling code (CVE-2018-4407)

The video of my exploit PoC for this vulnerability got quite a lot of views on the internet, so it’s no surprise that it didn’t take long before the first exploit was published. I was particularly impressed by this spectacular “PoC in a tweet” by Zuk. My PoC is embarrassingly verbose by comparison. But I am hoping that some people might nonetheless find it educational, because it is shows how to use the raw socket interface on Linux, from a C program.

As I explained in my earlier blog post, the vulnerability is in the function icmp_error. It turns out that the simplest way to trigger the vulnerability is by sending a malicious TCP packet to the device. A TCP packet consists of an IP header, followed by a TCP header, followed by the body of the packet. Both the IP header and the TCP header are usually 20 bytes long, but can be up to 60 bytes long if extra options are added. To trigger the vulnerability, the combined size of the IP and TCP headers must be at least 84 bytes.

In the blog post, I deliberately didn’t mention TCP, because I didn’t want to give away too many clues about how to trigger the vulnerability. This led to many people mistakenly thinking that the vulnerability was a new version of the Ping of Death. But knowledgeable people such as Brandon Enright quickly pointed out that the vulnerability is triggered during the creation of an ICMP reply message, which means that it isn’t necessarily triggered by sending a malcious ICMP messsage. This excellent article by Jonathan Bennett explained that ICMP messages are used to communicate network status, so ICMP has many applications besides ping.

If I had known more about ICMP and its applications, then maybe it would have been obvious to me how to implement the PoC. But I was not familiar with the function icmp_error, so I had to figure out what it was for and how to trigger it by reading and analyzing the code. I found that it was helpful to use a simple CodeQL data-flow query to search for paths that lead to icmp_error. I started with this query:

/**
 * @name Paths to icmp_error
 * @description Find data-flow paths that lead to the first parameter of icmp_error.
 * @kind path-problem
 * @problem.severity warning
 */

import cpp
import semmle.code.cpp.dataflow.DataFlow
import DataFlow::PathGraph

class Config extends DataFlow::Configuration {
  Config() { this = "tcphdr_flow" }

  override predicate isSource(DataFlow::Node source) {
    exists (source.asExpr())
  }

  override predicate isSink(DataFlow::Node sink) {
    // The sink is the zero'th parameter of `icmp_error`: `struct mbuf *n`.
    exists (Parameter p
    | p = sink.asParameter() and
      p.getFunction().getName() = "icmp_error" and
      p.getIndex() = 0)
  }
}

from Config cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select source, source, sink, "Expression flows to icmp_error."

A quick reminder: if you would like to try running this query yourself, then you just need to download CodeQL for Eclipse and a snapshot for XNU.

[EDIT]: You can also use our free CodeQL extension for Visual Studio Code. See installation instructions at https://securitylab.github.com/tools/codeql/.

My hypothesis was that the zero’th parameter of icmp_error, named n, was an mbuf containing an incoming network packet. So this query looks for any expression with a data-flow path to that parameter. It does not place any restrictions on the source, other than that it should be an expression. This query had 84 results. But looking through the results, I noticed that some of the sources were in the function ip_input. Since ip_input is the main function that handles incoming IP packets, this seemed like an interesting place to look. So I added this extra condition to the isSource method:

source.getFunction().getName() = "ip_input"

This reduced the number of results to eight. However, several of those results had more than one possible data-flow path from source to sink, so there were still quite a lot of paths to look through. I noticed that many of those paths went via a function named ip_forward. Although those paths looked plausible enough, they also looked a little complicated, so I decided to first see if I could find something simpler. I added a barrier to exclude any data-flow path that goes through ip_forward:

override predicate isBarrier(DataFlow::Node node) {
  node.getFunction().getName() = "ip_forward"
}

This reduced the number of results to five. Here is the final query:

/**
 * @name Paths from ip_input to icmp_error
 * @description Find data-flow paths that lead from ip_input to the first parameter of icmp_error.
 * @kind path-problem
 * @problem.severity warning
 */

import cpp
import semmle.code.cpp.dataflow.DataFlow
import DataFlow::PathGraph

class Config extends DataFlow::Configuration {
  Config() { this = "tcphdr_flow" }

  override predicate isSource(DataFlow::Node source) {
    exists (source.asExpr()) and
    source.getFunction().getName() = "ip_input"
  }

  override predicate isSink(DataFlow::Node sink) {
    // The sink is the zero'th parameter of `icmp_error`: `struct mbuf *n`.
    exists (Parameter p
    | p = sink.asParameter() and
      p.getFunction().getName() = "icmp_error" and
      p.getIndex() = 0)
  }

  override predicate isBarrier(DataFlow::Node node) {
    node.getFunction().getName() = "ip_forward"
  }
}

from Config cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select source, source, sink, "Expression flows to icmp_error."

The five remaining results are all variations on the following call path:

So the way that we trigger the vulnerability is by sending a TCP packet with extra IP and TCP options (so that the combined header size is at least 84 bytes) and we deliberately make some of the IP options invalid so that ip_dooptions calls icmp_error. There are many ways to make the IP options invalid. The simplest approach, which Zuk used for his PoC in a tweet, is to fill the IP options with garbage. I tried to be a bit more subtle, to reduce the chance that a suspicious router might notice the bogus options and drop the packet. In the end though, I am not sure that it made any difference. I was trying to find a way to send a malicious packet across the internet, but it seems that most internet routers will drop any packet with non-zero IP options, so I wasn’t able to get it to work. I found that my PoC was very reliable on home and office networks though.

The exact call path that I decided to use for the PoC is as follows:

ip_input() (bsd/netinet/ip_input.c:1835)
call to ip_dooptions (bsd/netinet/ip_input.c:2185)
ip_dooptions() (bsd/netinet/ip_input.c:3222)
goto bad (bsd/netinet/ip_input.c:3281)
call to icmp_error (bsd/netinet/ip_input.c:3495)
icmp_error() (bsd/netinet/ip_icmp.c:203)

The key step is the fourth: goto bad in ip_dooptions. That’s how we trigger the call to icmp_error. But there are many such gotos to choose from. If you just fill the IP options with garbage then you will most likely hit the one on line 3256. But that seemed too crude to me. I have noticed that most IP options handling implementations check that the lengths of the options are valid. But they often skip options that aren’t of interest. For example, that’s what happens on line 3262. So I thought it would be more subtle to insert an error into an obscure-sounding option (IPOPT_LSRR), in the hope that most routers would not notice it.

Client-side NFS vulnerabilities (CVE-2018-4259, CVE-2018-4286, CVE-2018-4287, CVE-2018-4288, CVE-2018-4291)

As I wrote in my previous blog post about these vulnerabilities, to trigger them I needed to implement my own NFS server. It turns out that this is actually a lot easier than it sounds. NFS uses RPC for communication between client and server. rpcgen is a tool which automatically generates this communication code from a specification written in RPC Language. And the complete RPC specification for NFSv3 is embedded in RFC 1813. So I just needed to copy the bits of the specification that I needed and run rpcgen on it.

To trigger the first vulnerability, I only needed to implement a very small subset of the specification. The vulnerability is a buffer overflow in this use of the nfsm_chain_get_fh macro. It is triggered by the server returning a file handle that is too big, in response to the MOUNTPROC3_MNT message. MOUNTPROC3_MNT is one of the first messages exchanged between the client and the server: it happens as soon as the client tries mount the NFS share. So my NFS server only needs to support a very small number of NFS operations to be able to trigger the vulnerability. You can see the full list of operations that I implemented here and here. If you compare this to the list of server operations in RFC 1813 (sections 3 and 5.2), then you will see how many operations I didn’t implement.

The key to the PoC is that I modified the definition of fhandle3. The correct definition can be found in section 5.1.5 of RFC 1813. File handles are supposed to be at most 64 bytes long, but I changed the definition so that I can return an arbitrarily long handle. The malicious file handle is created here.

Note: Post originally published on LGTM.com on November 24, 2018