In this post I’ll explain how we can detect a type of vulnerability known as reflected file download (RFD) using CodeQL. As an example, I’ll use a vulnerability that I recently reported in the popular collaborative online editor Etherpad.
Details of the vulnerability in Etherpad (CVE-2018-6835)
This vulnerability affects versions of Etherpad Lite prior to 1.6.3 and is caused by the lack of sanitization in the name of the JSONP callback function used in the HTTP API. This allows a URL to be crafted in a way that, when visited, triggers an executable file to be downloaded that appears to originate from the host of the Etherpad application. The vulnerability has been assigned CVE-2018-6835.
Reflected file download (RFD)
Reflected file download (RFD) is a relatively new attack vector discovered in 2014 by Oren Hafif. It has since affected multiple organizations and applications, including Google. This attack works very similarly to the more familiar cross site scripting (XSS), which relies on the victim clicking on a URL link that comes from a trusted domain. In the case of RFD, instead of running JavaScript in the victim’s browser, an executable file will be offered to the user as a download. As the file appears to originate from a trusted domain, the victim may then think that it is safe to run the file. The main ingredients in an RFD vulnerability are:
- User input in the URL being reflected back in the response.
- The vulnerable URL is permissive and allows the file name to be manipulated.
If the vulnerable application serves its output using a Content-Disposition: attachment
HTTP header, a user’s browser will automatically offer the injected content as a downloadable file. If the filename
attribute in this header is not set, then the file name and extension can be manipulated by the attacker to make the file executable on some platforms. If the Content-Disposition: attachment
header is not provided by the application, an attacker could craft the link using the download
attribute in order to force the user’s browser to treat the injected content as a download, and set the filename. For example:
<a href="http://example.com/specially-crafted-URL" download="RunMe.bat">You can trust this link</a>
Using CodeQL to find the RFD vulnerability in Etherpad
Let’s take a look at the vulnerability in Etherpad.
Using CodeQL, we can look for user input being reflected back in the server’s response. To begin with, we’ll search for functions that are likely to be handlers of HTTP requests. In JavaScript, it is fairly common to name the variable that represents a request req
and a variable that represents a response res
. A function that handles a request will normally take these as parameters. The CodeQL library for JavaScript analysis provides a Function
class to model functions. To start our exploration we can extend that Function
class in order to find functions that contain two variables representing the request and response:
class LikelyRouteHandler extends Function {
Variable req;
Variable res;
LikelyRouteHandler() {
req = getParameter(0).(SimpleParameter).getVariable() and req.getName() = "req" and
res = getParameter(1).(SimpleParameter).getVariable() and res.getName() = "res"
}
}
Running a query that looks for such functions gives me 37 results — let’s try to make the query more specific. If the variable res
in these methods is indeed an HTTP response, then res
will be passed as a parameter to the send
method which sends the response to the browser. Using CodeQL we can track the flow of req
and find where it is used as an argument of this send
method. As is the case with Java and C++ queries, this can be achieved through CodeQL’s DataFlow
library. Before we move on to that part, let’s add some methods to the LikelyRouteHandler
definition to enable us to look for accesses to the req
parameter, as well as calls to the send
method of res
.
// Gets a method of `res` that sends an HTTP response.
string getASendMethodName() {
// res.send
result = "send"
or
// or a method `m` such that there is an assignment `res.m = res.n` where `n`
// is already known to be a send method
exists (DataFlow::PropWrite pwn |
pwn.getBase().asExpr() = res.getAnAccess() and
pwn.getPropertyName() = result and
pwn.getRhs().asExpr() = getASendMethodReference()
)
}
// Gets a reference to `res.send` or some other known send method
PropAccess getASendMethodReference() {
result.getBase() = res.getAnAccess() and
result.getPropertyName() = getASendMethodName()
}
In the above code snippet, getASendMethodName
looks for methods that are either simply called send
, or methods that are assigned res.send
(further down, I’ll describe why this is important!). For example:
res.send2 = res.send // flag up `send2` as well
res.send3 = res.send2 // flag up `send3` as well
So, if we have res.someMethod = referenceToSend
then res.someMethod
is also a send
method. In the QL language, the left hand side is identified as:
// Base of the property write is an access to the variable `res`
pwn.getBase().asExpr() = res.getAnAccess() and
// The name of the property should also be identified
pwn.getPropertyName() = result
The right hand side is given by the getASendMethodReference
. Now a send
method reference is something that is already known to refer to a send
method, so its name should already have been a sendMethodName
. We therefore define it as:
// Gets a reference to `res.send` or some other known send method
PropAccess getASendMethodReference() {
result.getBase() = res.getAnAccess() and
//Its name should already be a send method name
result.getPropertyName() = getASendMethodName()
}
With this definition it becomes easy to identify calls to the send
methods:
/. Gets a call to the send method
CallExpr getASendMethodCall() {
result.getCallee() = getASendMethodReference()
}
Running a query to look for calls to the send
method results in some interesting function names in Etherpad. It was worth taking the effort to look for assignments that use the send
method!
We are now in a position to use CodeQL’s DataFlow
library to track the flow between the request and the response in JavaScript code. Apart from general DataFlowConfiguration
, the JavaScript library also comes with some specialized TaintTrackingConfiguration
. For reflected type tracking, the ReflectedXss::Configuration
, ReflectedXss::Source
and ReflectedXss::Sink
classes in semmle.javascript.security.dataflow.ReflectedXss
are exactly what we need.
Before looking at the details, let’s take a step back and summarize what we’re trying to find. We’re looking for:
- a method that handles requests and responses (using
LikelyRouteHandler
), that - has a parameter
req
containing potentially tainted data provided by the user, which - flows back into the response sent back to the user using a
send
method (or any alias)
In DataFlow
terminology: we need to define the req
parameter provided to a LikelyRouteHandler
as a Data Flow source, and any argument to the send
method as a Data Flow sink. We can extend the ReflectedXss::Source
and ReflectedXss::Sink
QL classes to achieve that:
// An argument passed to `res.send`, marked as an XSS sink
class LikelySendArgument extends ReflectedXss::Sink {
LikelySendArgument() {
asExpr() = any(LikelyRouteHandler rh).getASendMethodCall().getAnArgument()
}
}
// An access to a request parameter, marked as an XSS source
class LikelyRequestParameter extends ReflectedXss::Source {
LikelyRequestParameter() {
exists (Expr base | base = asExpr().(PropAccess).getBase() |
// either a property access on `req` itself
base = any(LikelyRouteHandler rh).getRequestVariable().getAnAccess()
or
// or a more deeply nested property access
base = any(LikelyRequestParameter p).asExpr()
)
}
}
The last step is to combine these definitions with a standard Data Flow query:
from ReflectedXss::Configuration xss, DataFlow::Node source, DataFlow::Node sink
where xss.hasFlow(sink, source)
select sink, "Cross-site scripting vulnerability due to $@.",
source, "user-provided value"
Running the query on Etherpad using the LGTM query console returns three results. Closer inspection reveals that the results in apicalls.js are of particular interest: the JSONP callback name provided by the user is part of the response.
Testing on Etherpad Lite 1.6.2 reveals that indeed no input validation is performed on the jsonp
parameter. For example:
(Note that the above query results are actually for a patched version of Etherpad. The vulnerability was patched by introducing a check: isValidJSONPName
, which can actually be seen in the query results.)
Crafting an RFD exploit
Now that we identified the unsanitized reflection, let’s try to create a malicious download link like those described by David Vassallo.
<a href="http://localhost:9001/api/1/test.sh?jsonp=echo cHJpbnQgJ0hlbGxvIFdvcmxkJwpwcmludCAncGF3bmQgOignCg==| base64 --decode | python||" download>Etherpad patch</a>
Note that test.sh
in the above is non-existent, but Etherpad will present an unsuspecting user with a downloadable file test.sh
upon visiting that URL:
The download appears to originate from the host of our Etherpad instance (in our case: localhost:9001
). The user may therefore think that the file can be trusted. The file contains the following Bash script:
echo cHJpbnQgJ0hlbGxvIFdvcmxkJwpwcmludCAncGF3bmQgOignCg==| base64 --decode | python||({"code":3,"message":"no such function","data":null})
The last bit is a JSONP artifact that is not used when Bash executes test.sh
. The base64-encoded string is a Python script that does the following:
print 'Hello World'
print 'pawnd :('
It is interesting to note that this fully works on Chrome. On Internet Explorer and Edge, the file that gets downloaded is called test.json
; the browser changed the file extension based on the JSON response type. Firefox, on the other hand, does not offer any file to download at all.
Patch
As I mentioned before, Etherpad patched the vulnerability. Initially, they added a check using the isVarNameName
method from the is-var-name
module, but later on they switched to their own sanitizer, isValidJSONPName
.
The DataFlow
libraries allow for sanitizers like these to be taken into account: instances of tainted data flowing from a source to a sink via a sanitizer can be filtered will not appear in the query results.
To do so, we simply extend the QL class TaintTracking::AdditionalSanitizerGuardNode
and override the sanitizes
predicate:
class IsValidJSONPSanitizer extends TaintTracking::AdditionalSanitizerGuardNode, DataFlow::CallNode {
IsValidJSONPSanitizer() {
this = DataFlow::moduleMember("./isValidJSONPName", "check").getACall()
}
override predicate appliesTo(TaintTracking::Configuration cfg) {
cfg instanceof ReflectedXss::Configuration
}
override predicate sanitizes(boolean outcome, Expr e) {
outcome = true and
e = getArgument(0).asExpr()
}
}
Running the new query produces the expected result: the vulnerability is no longer flagged up.
Mitigation for RFD in general
To protect yourself (and your users) from reflected file downloads, make sure to:
- Sanitize user input that is reflected into the response.
- Use exact URL mapping and remove support for path parameters when possible.
- Set a fixed file name when using the
Content-Disposition
header.
Disclosure timeline of CVE-2018-6835
- 29 Jan 2018: Initial private disclosure to John McLear of Etherpad.
- 3 Feb: Patched version (1.6.3) released.
Note: Post originally published on LGTM.com on March 29, 2018