Facebook Fizz integer overflow vulnerability (CVE-2019-3560)

This post is about a denial of service vulnerability which I found in Facebook’s Fizz project, using a CodeQL query. The vulnerability is an infinite loop which can be triggered by an unauthenticated remote attacker. Fizz is Facebook’s TLS implementation, which means that it is used for the “https:” part of https://facebook.com. In a blog post about Fizz, published on 2018-08-06, Facebook engineers Kyle Nekritz, Subodh Iyengar, and Alex Guzman said the following about how widely Fizz is deployed at Facebook:

We have deployed Fizz and TLS 1.3 globally in our mobile apps, Proxygen, our load balancers, our internal services, and even our QUIC library, mvfst. More than 50 percent of our internet traffic is now secured with TLS 1.3.

Fizz is an open source project, so it is likely that other projects and organizations are also using it.

Severity and Mitigation

The impact of the vulnerability is that an attacker can send a malicious message via TCP to any server that uses Fizz and trigger an infinite loop on that server. This could make the server unresponsive to other clients. The vulnerability is classified as a denial of service (DoS), because it enables an attacker to disrupt the service, but not to gain unauthorized access. The size of the message is just over 64KB, so this attack is extremely cheap for the attacker, but crippling for the server. To illustrate this, a single computer with an unexceptional domestic-grade internet connection (1Mbps upload speed) could send two of these messages per second. Since each message knocks out one CPU core, it would only take a small botnet to quickly debilitate an entire datacentre.

Facebook fixed the vulnerability very quickly. I reported it to them on 2019-02-20 and they pushed the fix to GitHub on 2019-02-25. Facebook have since informed me that they patched all of their own servers within hours on 2019-02-20.

I am not aware of any mitigations against this vulnerability, other than upgrading Fizz, so I recommend that all Fizz users upgrade as soon as possible. The vulnerability was fixed in version v2019.02.25.00.

Proof-of-concept exploit

I have written a proof-of-concept which triggers the vulnerability. It is a simple C program, that opens a TCP socket to the server and sends a malicious payload just over 64KB in size. The program closes the socket as soon as it has sent the payload, but the server does not notice this because it is already stuck in an infinite loop. I did not test the payload on any real websites, only on the demo server application that is included with the Fizz source code. However, the vulnerability is in the core of the Fizz library, not in the demo application, so I believe that https://facebook.com was at risk until I reported this vulnerability.

Facebook have already patched their systems and are no longer vulnerable, but I will wait a couple of weeks before I release the source code for the exploit PoC, so that other Fizz users have time to upgrade.

The vulnerability

The vulnerability is due to an integer overflow in the += on line 42 of PlaintextRecordLayer.cpp:

auto length = cursor.readBE<uint16_t>();
if (buf.chainLength() < (cursor - buf.front()) + length) {
  return folly::none;
}
length +=
    sizeof(ContentType) + sizeof(ProtocolVersion) + sizeof(uint16_t);
buf.trimStart(length);
continue;

This code reads a uint16_t from the incoming network packet and assigns it to length. In other words, the value of length is attacker-controlled. The if statement on line 39 looks a bit like a bounds check, but it is actually just checking that enough data has been received to continue parsing. This is the reason why the exploit needs to send 64KB of data: the code will not hit the integer overflow on line 42 until it has received at least length bytes. The exploit works by setting length = 0xFFFB. This means that after the +=, the value of length is 0. This, in turn, means that the call to trimStart on line 43 does not consume any data, so no progress is made before the next iteration of the loop. The fix for the vulnerability is simple: use a larger type than uint16_t to compute the addition, so that an integer overflow is impossible.

I have not told the full story of how the exploit works. Setting length = 0xFFFB is the easy bit! I found it slightly trickier to figure out how to construct a message that actually triggers this line of code. To give other Fizz users time to upgrade, I will wait a couple of weeks before publishing the full details of the exploit.

Finding the vulnerability with CodeQL

In their blog post, Facebook’s engineers describe Fizz as “secure from the ground up” and list some of the C++ programming techniques they have used to avoid common pitfalls such as incorrect state machine transitions. I would agree that the general code quality of Fizz looks very good. It uses a modern C++ style, so it is much less likely to suffer from some of the bugs that plague older C projects. In particular, it doesn’t do any manual memory management, so it far less likely to suffer from something like a buffer overflow, which is so common in other projects. Facebook have also told me that they use fuzzing on Fizz, and that they have done a security review with an external consultancy. In other words, this is a high quality project and the team is doing everything according to best practice. So how did this bug slip through the net? And how did CodeQL find it?

When you look at line 42 of PlaintextRecordLayer.cpp, it is obvious that it contains an integer overflow. The difficulty, of course, is knowing which line of code to look at. Fuzzing can be very effective for automatically finding bugs, but it is based on randomly generated inputs so it doesn’t work so well when the probability of hitting certain code paths by chance is very low. (As I mentioned above, the hardest part of writing the exploit was figuring out how to construct an input that would reach the vulnerable line of code.) But CodeQL is not constrained in the same way. Using CodeQL, I can instead search for any potential integer overflows, regardless of how difficult they might be to trigger. Some of the results might not be triggerable in practice, but it’s better to be safe than sorry. However, I cannot expect developers to fix thousands of “bugs” that are only hypothetical in nature, so the query needs to be precise enough that it will only returns results where it is at least plausible that an attacker could trigger an overflow.

I originally found this vulnerability with a slightly different query, but my colleague Jonas Jensen came up with the improved version which I will use here. It is more accurate than my original query and it also showcases our new C++ intermediate representation, which is a new feature currently under development. IR helps to simplify the query when there are multiple source syntaxes for the same operation. For example, the following three lines of code all do exactly the same thing:

x = x+1;
x += 1;
x++;

Without IR, a query for finding additions in the code would need separate clauses for each of these syntaxes. But with IR, the query only needs one clause for AddInstruction.

Using IR, let’s first write a query which finds all the conversions from a larger type to a smaller type. These are conversions that might overflow.

import cpp
import semmle.code.cpp.ir.IR

from ConvertInstruction conv
where conv.getResultSize() < conv.getUnary().getResultSize()
select conv

If you would like to try running this query yourself, then you just need to download CodeQL for Eclipse and a snapshot for Fizz.

[EDIT]: You can also use our free CodeQL extension for Visual Studio Code. See installation instructions at https://securitylab.github.com/tools/codeql/.

As you might expect, the query above finds a lot of narrowing conversions in the code. It’s a useful starting point for finding potential overflows, but we need to narrow the number of results down. An integer overflow is only a security vulnerability if it can be deliberately triggered by an attacker. So we need to enhance the query with contextual information about which of these conversions might potentially depend on an untrusted input value. Jonas’s query does this by using the new IR taint tracking library to find expressions that depend on untrusted input. But where does untrusted input come from? This often depends on the application, and so it often helps to build a model of the attack surface of the application. In the case of Fizz, it turns out that the untrusted input arrives via another Facebook library, called Folly. Folly puts the data in an IOBuf, which is subsequently read by Fizz. So one way to model the sources of untrusted data would be to find all the uses of IOBufs in Fizz. But we found a different solution that is both simpler and less specific to the Fizz project: When data is sent over a socket, it is usually sent in network byte order. So network data usually needs to be converted to host byte order, typically using ntohs or ntohl. This means that ntohs and ntohl are often excellent proxies for “untrusted input”. The only hitch is that Fizz doesn’t use ntohs and ntohl! Instead, it uses the Endian class. The following QL class identifies the method of Endian which is used for converting network byte order to host:

class EndianConvert extends Function {
  EndianConvert() {
    this.getName() = "big" and
    this.getDeclaringType().getName().matches("Endian")
  }
}

Putting it all together, here is Jonas’s query, which uses taint tracking to find potentially unsafe narrowing conversions of expressions that might depend on untrusted input:

/**
 * @name Fizz Overflow
 * @description Narrowing conversions on untrusted data could enable
 *              an attacker to trigger an integer overflow.
 * @kind path-problem
 * @problem.severity warning
 */

import cpp
import semmle.code.cpp.ir.dataflow.TaintTracking
import semmle.code.cpp.ir.IR
import DataFlow::PathGraph

/**
 * The endianness conversion function `Endian::big()`.
 * It is Folly's replacement for `ntohs` and `ntohl`.
 */
class EndianConvert extends Function {
  EndianConvert() {
    this.getName() = "big" and
    this.getDeclaringType().getName().matches("Endian")
  }
}

/**
 * Holds if `i` is an endianness conversion.
 * (A telltale sign of network data.)
 */
predicate isNetworkData(Instruction i) {
  i.(CallInstruction).getCallTarget().(FunctionInstruction).getFunctionSymbol() instanceof
    EndianConvert
}

/** Holds if `i` is a narrowing conversion. */
predicate isNarrowingConversion(ConvertInstruction i) {
  i.getResultSize() < i.getUnary().getResultSize()
}

class Cfg extends TaintTracking::Configuration {
  Cfg() { this = "FizzOverflowIR" }

  /**
   * Holds if `source` is network data.
   */
  override predicate isSource(DataFlow::Node source) { isNetworkData(source.asInstruction()) }

  /** Holds if `sink` is a narrowing conversion. */
  override predicate isSink(DataFlow::Node sink) { isNarrowingConversion(sink.asInstruction()) }
}

from
  Cfg cfg, DataFlow::PathNode source, DataFlow::PathNode sink, ConvertInstruction conv,
  Type inputType, Type outputType
where
  cfg.hasFlowPath(source, sink) and
  conv = sink.getNode().asInstruction() and
  inputType = conv.getUnary().getResultType() and
  outputType = conv.getResultType()
select sink, source, sink,
  "Conversion of untrusted data from " + inputType + " to " + outputType + "."

This query has exactly one result, which is the vulnerability described above. The query has zero results on newer revisions of the code which include the fix.

Bug Bounty

On 2019-03-13, I received this nice message from Facebook:

Hi Kevin Backhouse,

After reviewing this issue, we have decided to award you a bounty of $10000. Below is an explanation of the bounty amount. Facebook fulfills its bounty awards through Bugcrowd.

This vulnerability could have allowed a malicious user to cause a denial of service against Facebook infrastructure.

While denial of service issues are typically not considered as part of our bug bounty program, this submission discussed scenarios which could have had significant risk.

Thank you again for your report. We look forward to receiving more reports from you in the future!

At Semmle, we have a policy of donating all bug bounties to charity. We have asked Facebook to donate the bounty to Techtonica. This means, thanks to Facebook’s policy of doubling the bounty when it is donated to charity, that Techtonica will receive $20000 from Facebook. Techtonica partners with tech companies to provide free tech training, living stipends, and job placement to women and non-binary adults in need in the Bay Area. So Techtonica’s mission fits nicely with Semmle’s commitment to improve the sharing of expertise in the software industry. In addition, Semmle is matching the original bounty amount of $10,000 with a donation to my own chosen charity, Community Servings. Community Servings is a not-for-profit food & nutrition program providing services in MA to individuals & families living with critical & chronic illnesses. I used to participate in one of their fund-raising events - Pie in the Sky - when I lived in Boston, so I am very pleased that we are able to send them such a big donation.

Timeline

2019-02-20: Privately disclosed to Facebook’s white hat program.
2019-02-20: Report acknowledged by Facebook and forwarded to their product team.
2019-02-20: Facebook patched all of their servers.
2019-02-25: Facebook pushed the fix to GitHub.
2019-03-13: Bug bounty confirmed by Facebook.
2019-03-19: CVE-2019-3560 disclosed by Semmle.

Note: Post originally published on LGTM.com on March 19, 2019