Apple NFS Diskless Boot: Negative integer overflow vulnerabilities (CVE-2018-4136 & CVE-2018-4160)

This post is about a pair of identical vulnerabilities, CVE-2018-4136 and CVE-2018-4160, found by my colleague Jonas Jensen. Both vulnerabilities are calls to bcopy with a user-controlled size argument which could be negative. In this post, I will explain how Jonas wrote the query which uncovered these vulnerabilities.

Severity and mitigation

Apple released a patch on March 29, 2018 for macOS versions High Sierra 10.13.3, Sierra 10.12.6, and El Capitan 10.11.6.

The vulnerabilities are in the NFS Diskless Boot implementation. As far as I know, NFS Diskless Boot is rarely used for Macs. It is certainly not a standard configuration, so I believe that the vast majority of Mac users are not at risk from these vulnerabilities.

I believe it would be a quite a lot of work to create an exploit PoC for this vulnerability, so I only sent Apple a description of why Jonas and I think the code is wrong. We are very grateful to Apple for accepting the report without requiring us to send a working exploit.

Darwin XNU and bcopy

Apple’s open source XNU kernel often uses bcopy to copy blocks of data. bcopy is implemented as a macro, expanding to __builtin___memmove_chk. (bcopy is equivalent to memmove, except its src and dst arguments are in the opposite order.) There are numerous ways to call bcopy incorrectly and accidentally introduce a security vulnerability. The most obvious is to call bcopy with a size argument that is bigger than the destination buffer. However Jonas had the idea to look for the opposite: what if there is a call to bcopy where the size argument could be negative? The type of the size parameter of bcopy is size_t, which is unsigned, so there would be a negative integer overflow and bcopy would be called with an enormous size argument. This would crash the kernel.

Using CodeQL to find a vulnerability

It is very easy to use CodeQL to find calls to bcopy that might have a negative size argument:

import cpp
import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis

from FunctionCall call
where
  call.getTarget().getName() = "__builtin___memmove_chk" and
  lowerBound(call.getArgument(2)) < 0
select call

This query searches for all calls to __builtin___memmove_chk (which is what bcopy and memmove expand to), where the size argument might be negative. The query uses the SimpleRangeAnalysis CodeQL library to determine whether the size argument could be negative. SimpleRangeAnalysis is conservative, so a negative lower bound does not necessarily mean that the size will ever be negative. But it does help to rule out cases like this, where the size is clearly never negative:

if (len < 0) {
  printf("error: negative size");
  return -1;
}
bcopy(src, dst, len);

This query finds 138 results on the XNU codebase. Unfortunately, most of the results are false positives, because SimpleRangeAnalysis is not an inter-procedural analysis, and is often unable to compute a tight lower bound for the size argument of calls to __builtin___memmove_chk given its localized nature. For many of these results, SimpleRangeAnalysis does not know anything about the value of the size argument other than that its type is int, so the analysis reports that the lower bound is -2147483648. This call to bcopy in audit_arg_argv is such an example. This example also illustrates why it is often very difficult to accurately analyze the range of an expression. audit_arg_argv is called here. The value of the length argument is imgp->ip_endargv - imgp->ip_startargv. It seems unlikely that imgp->ip_endargv could be less than imgp->ip_startargv, or that it could exceed it by enough to trigger an integer overflow, but it would be difficult for a static analyzer to prove. So a more intricate range analysis may be able to go some way to producing tighter bounds, but a large number of false positives would likely still remain.

Since our goal is to find security vulnerabilities, there is a much better way to focus the search: look for results where the value of length might be controllable by an attacker. We can use the DataFlow CodeQL library to do this. The idea is to define a set of sources that untrusted data could flow from. We then use the DataFlow library to track the flow of the untrusted data to see if any of it reaches any of the calls to bcopy which we identified earlier.

/**
 * @name bcopy with negative size
 * @description Calling bcopy with a negative size argument will crash the
 *              kernel due to a negative integer overflow.
 * @kind path-problem
 * @problem.severity warning
 * @id apple-xnu/cpp/bcopy-negative-size
 */

import cpp
import semmle.code.cpp.dataflow.DataFlow
import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis
import DataFlow::PathGraph

class MyCfg extends DataFlow::Configuration {
  MyCfg() {
    this = "MyCfg"
  }

  override predicate isSink(DataFlow::Node sink) {
    exists (FunctionCall call
    | sink.asExpr() = call.getArgument(2) and
      call.getTarget().getName() = "__builtin___memmove_chk" and
      lowerBound(sink.asExpr()) < 0)
  }

  override predicate isSource(DataFlow::Node source) {
    source.asExpr().(FunctionCall).getTarget().getName() = "mbuf_data"
  }

  override predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) {
    node2.asExpr() = node1.asExpr().getParent()
  }
}

from DataFlow::PathNode sink, DataFlow::PathNode source, MyCfg dataFlow
where dataFlow.hasFlowPath(source, sink)
select sink, source, sink, "The size argument of bcopy might be negative."

The DataFlow library is designed to be configurable so that you can define your own sources and sinks. To use it, you have to define a new class which extends DataFlow::Configuration and implements the isSink and isSource methods. My isSink method uses the same logic as before: the size argument of a call to bcopy is a sink if it could be negative. For the sources, I have only included one category of untrusted data for now: any data returned by mbuf_data has potentially been read from an incoming message. Jonas’s original query included numerous other potential sources, such as the parameters of ioctl callbacks and data read using copyin, but I have omitted them here for simplicity.

isAdditionalFlowStep is an optional method, which you can use to add extra dataflow edges to the dataflow graph. I have had to use it here because the DataFlow library is quite conservative by default and only tracks data that is not modified. For example, if x contains untrusted data then the DataFlow library does not consider x+1 to be untrusted by default. In the near future, we will be publishing a TaintTracking library, which includes these additional edges and is therefore more suitable for writing security queries. I am using the DataFlow library for this blog post so that the query will work if you run by CodeQL.

The new query has 31 results on the XNU codebase, which is a small enough number to check manually. Although there are other conditions that I could add to the query to further reduce the number of results, I find that it is often interesting to look at the last few results by hand. For example, this is one of the results. After reviewing this code, I don’t think it’s possible for n->m_len to be negative, but the code is sufficiently complicated that I had to read it quite carefully to see why.

The bugs

Two of the results reported by the query are this one and this one. Both results are almost identical to each other, so I will just use the first to explain the problem. Here is a copy of the relevant code:

 p = mbuf_data(m);
 msg_len = mbuf_len(m);

 /* server name */
 if (msg_len < (int)sizeof(*str))
   goto bad;
 str = (struct rpc_string *)p;
 sn_len = ntohl(str->len);
 if (msg_len < sn_len)
   goto bad;
 if (sn_len >= MAXHOSTNAMELEN)
   goto bad;
 bcopy(str->data, serv_name, sn_len);

Notice that mbuf_data is used to read untrusted data on line 627. This is then cast to an rpc_string struct on line 633 and the untrusted length is read from the struct on line 634. The checks on line 635 and line 637 ensure that the length isn’t too big, but there is no check to prevent it from being negative. Therefore, a negative length will cause the kernel to crash on line 639.

At the time of writing, Apple have not yet released the source code for macOS version High Sierra 10.13.4, so I have not seen how they have changed the code to fix the bugs.

Conclusion

My colleague Jonas Jensen wrote a query which found two vulnerabilities in Apple’s open source XNU kernel. The query uses a combination of range analysis and dataflow analysis to find calls to bcopy where the size argument might be negative. The risk of exploitation is likely minimal, but Apple has released a patch, and it is recommended you update as soon as possible.

Bug timeline

Jan 16, 2018: privately reported these vulnerabilities to product-security@apple.com
Jan 17, 2018: initial response from an Apple employee to acknowledge receipt of the report
Mar 08, 2018: confirmation from Apple that the bug would be fixed in an upcoming update
Mar 29, 2018: patch released by Apple and vulnerability publicly announced

Note: Post originally published on LGTM.com on April 26, 2018