LiveQL Episode II

On March 17, we streamed our second episode of LiveQL. For those of you who missed it or prefer to read instead, this post provides a run through of the details, including lessons learned.

And as always, you can watch this and previous episodes on GitHub’s YouTube channel.

LiveQL

In our LiveQL sessions, we pair a security researcher with a CodeQL expert so that they can discuss and help model a vulnerability class or specific CVE. LiveQL episode 1 featured Aditya Sharad, senior Software Engineering manager at GitHub, and Nico Waisman, head of Security and Privacy at Lyft, talking about non-intuitive string manipulation vulnerabilities in C code.

In this second episode, we’re joined by Pavel Avgustinov, senior director of Code Intelligence at GitHub. Pavel is also one of the original CodeQL creators (from Semmle) and is without a doubt one of the most knowledgeable people when it comes to wielding CodeQL effectively. In the security researcher corner we feature yours truly. Some background on me: I’m a security researcher with a predilection for Java-based vulnerabilities, so naturally I picked an interesting Java CVE to analyze and model with Pavel’s help: CVE-2021-25646.

CVE-2021-25646: A Rhino in a nutshell

Pavel and I begin by reviewing the National Vulnerability Database (NVD) advisory, which reads:

Apache Druid includes the ability to execute user-provided JavaScript code embedded in various types of requests. This functionality is intended for use in high-trust environments and is disabled by default. However, in Druid 0.20.0 and earlier, it is possible for an authenticated user to send a specially-crafted request that forces Druid to run user-provided JavaScript code for that request, regardless of server configuration. This can be leveraged to execute code on the target machine with the privileges of the Druid server process.

With this information, we know that there is probably a Rhino/Nashorn script injection. Also, it seems like the evaluation of Javascript scripts should be disabled by default and only enabled as an opt-in in “high-trust” environments. We can learn more about what Javascript is used for in the context of Apache Druid in its Javascript programming guide. Here we find an important security warning:

security warning

It seems like the vulnerability is related to being able to bypass this control and make Druid evaluate user-controlled scripts even when this evaluation is globally disabled.

The next step in our analysis is to look for information about this CVE which leads us to a nice write-up (in Chinese) describing how to reproduce the vulnerability. But first, we need to set up our testing environment. Luckily for us, Apache Druid comes with a docker image that will let us spin up our own environment quickly and safely. The only thing we want to modify in the default docker-compose configuration is to add an environment variable to configure the Java virtual machine (JVM) in debug mode and expose the Java Debug Wire Protocol (JDWP) port to the host:

Coordinator:
  image: apache/druid:0.19.0
  container_name: coordinator
  environment:
    - JAVA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5555`
  volumes:
    - ./storage:/opt/data
    - coordinator_var:/opt/druid/var
  depends_on:
    - zookeeper
    - postgres
  ports:
    - "8081:8081"
    - "5559:5555"
  command:
    - coordinator
  env_file:
    - environment

Now we can start up the IDE, load the Apache Druid project, and connect the debugger to port 5559:

debug configuration

We also need to analyze the traffic between the frontend and the backend, so we’ll fire up a request interception proxy. In our case, we use Burp and start up a Chromium browser which is preconfigured to send all HTTP traffic through the proxy.

With everything set up for our analysis, we can now focus on reproducing the vulnerability. Following the instructions on the aforementioned write-up, we load some sample data from the local disk, parse it, and then add a filter column that will trigger a request to the backend:

request

You can see that the filter being used by default is of the selector type. Let’s change it to a Javascript one:

response

As expected, we get an error saying that Javascript evaluation is disabled, plus a useful stack trace that we’ll use to set breakpoints in the running application. Specifically, two frames are interesting:

first frame

last frame

So how can we disable the Javascript protection? Taking a closer look at the proof of concept (PoC), you can see we missed an important piece and need to add that bit to our request:

request payload

We get a successful response and can look at the container to verify our command was executed:

pwned

To answer that question, we set up a breakpoint at the above mentioned lines:

The first one helps us verify we’re hitting the right endpoint and is also very useful for defining the dataflow sources for our posterior analysis.

The second one gives us the context we need to understand what’s going on.

checkState control

In the above screenshot, you can see that our empty string named object was somehow being deserialized into a JavaScriptConfig object that was later checked to decide whether or not Javascript evaluation should be allowed.

Jumping to the config definition, we see the following @JsonCreator annotated constructor:

JavaScriptDimFilter

For those of you not familiar with Jackson library, this constructor is used to map a JSON string into an instance of JavascriptDimFilter. Apparently, our filter JSON string was being bound to this class. Something is off here! On one hand, the @JsonProperty annotated parameters should be mapped to the JSON properties with the name specified in the annotation, but what about @JacksonInject? Intuition tells us that the value should be injected by the dependency injection framework, especially considering that config was not annotated with @JsonProperty. So what JSON property should be mapped to the config parameter, assuming that the JSON input string should even be mapped to the config parameter in the first place?

Checking the @JacksonInject annotation documentation we finally start to make some sense out of this:

Jackson-specific annotation used for indicating that value of annotated property will be “injected”, i.e. set based on value configured by ObjectMapper (usually on a per-call basis). Usually property is not deserialized from JSON, although it is possible to have injected value as default and still allow optional override from JSON.

Well, it turns out that despite the claim that “…usually property is not deserialized from JSON,” this was actually the default behaviour:

Default is OptBoolean.DEFAULT, which translates to OptBoolean.TRUE: this is for backwards compatibility (2.8 and earlier always allow binding input value).

Therefore, the JSON input string should actually be able to override the injected value! The only remaining mystery is why a non @JsonProperty annotated parameter defaulted to an empty string. This seems to be a bug in Jackson. Otherwise, what would happen if more than one parameter missed the @JsonProperty annotation?

Now that we know how the protection was disabled, we only need to figure out one more thing: where was our script evaluated? Stepping through the code from the Javascript configuration check, we quickly found our sink within the JavaScriptPredicateFactory constructor:

sink

So far so good. The open questions we want to answer with CodeQL are:

Variant analysis with CodeQL

At this point, we move to the VSCode + CodeQL extension to start answering our questions. If you want to follow along, we recommend you to pause here, install the extension, download a pre-existing Apache Druid CodeQL database or create one by using the CodeQL CLI. Finally, try the following queries by yourself.

How can we model the dataflow to Rhino API sinks?

Pavel first demonstrates the basics of CodeQL queries by finding all calls (including method and constructor calls) where the declaring type belongs to a package that contains the javascript keyword (since this was the case for Rhino classes):

Import java

from Call call
where call.getCallee().getDeclaringType().getPackage().getName().matches("%javascript%")
select call

If you’re completely new to CodeQL, this query may give you an idea of what CodeQL is and how it enables you to treat code as data in a way that lets you ask questions about the codebase.

This simple query returns 70 results and gives us an idea of the extent of the usage of the Rhino library.

Below, you can see how we rewrite the same query using an object-oriented approach so that we can easily reuse it in our queries:

import java

class JavascriptApi extends Call {
  JavascriptApi() {
    this.getCallee().getDeclaringType().getPackage().getName().matches("%javascript%")
  }
}

from JavascriptApi call
select call

Another great trick of Pavel’s is listing all the places in the application where user-controllable data is being read (Remote flow source in CodeQL terminology). That is, the attack surface of the application:

import java
import semmle.code.java.dataflow.FlowSources

from RemoteFlowSource source
select source

That simple query returns 440 results, including things like HttpServletRequest query parameters, headers, cookies, REST endpoints, and data transfer objects.

We then check if that attack surface contains the entry point we saw before in the error stack trace, the one leading to the vulnerability (SamplerResource.post())

import java
import semmle.code.java.dataflow.FlowSources

from RemoteFlowSource source
where source.getLocation().getFile().getBaseName() = "SamplerResource.java"
select source

The query returns one result which means the standard CodeQL libraries already model JAXRS framework and we should not model it ourselves.

Now we have all pieces we needed to model the dataflow:

Through several query iterations in which Pavel showsmany interesting debugging tricks, such as using any() as a catch-all sink to figure out why we’re not getting the expected flow, we come up with the following query that returns not just the issue described in the write-up, but also many other paths leading to arbitrary script evaluation:

/**
 * @kind path-problem
 */
import java
import semmle.code.java.dataflow.FlowSources
import semmle.code.java.dataflow.FlowSteps
import semmle.code.java.dataflow.TaintTracking

class JavascriptApi extends Call {
  JavascriptApi() {
    this.getCallee().getDeclaringType().getPackage().getName().matches("%javascript%")
  }
}

// [4]
RefType jacksonDeserialisedType() {
  result = any(RemoteFlowSource s).getType() or
  result = jacksonDeserialisedType().getASubType() or
  result = jacksonDeserialisedType().getAField().getType()
}

// [5]
class JacksonFieldStep extends TaintTracking::AdditionalTaintStep {
  override predicate step(DataFlow::Node a, DataFlow::Node b) {
    // a.b
    a = DataFlow::getFieldQualifier(b.asExpr().(FieldRead)) and
    a.getType() = jacksonDeserialisedType()
  }
}

// [1]
class JsConfig extends TaintTracking::Configuration {
  JsConfig() {
    this = "JsConfig"
  }


  // [2]
  override predicate isSource(DataFlow::Node source) {
    source instanceof RemoteFlowSource and
    source.getLocation().getFile().getBaseName() = "SamplerResource.java"
  }


  // [3]
  override predicate isSink(DataFlow::Node sink) {
    sink.asExpr() = any(JavascriptApi api).getArArgument() and
    sink.getType() instanceof TypeString
  }
}

// [6]
from DataFlow::PathNode source, DataFlow::PathNode sink
where any(JsConfig config).hasFlowPath(source, sink)
select sink, source, sink, "tainted by $@", source, source.toString()

That may seem complicated at first glance, so let’s step through this query.

Since the issue we’re trying to find is an injection issue, we need to track the attacker-controllable tainted data throughout the application to check whether it reaches a potentially vulnerable sink. For that purpose, we create a custom JsConfig TaintTracking configuration [1] specifying that our sources will be any RemoteFlowSource located in the SamplerResource.java file and that our sink is any String argument to a Rhino API.

This configuration alone returns no results. During the video, we show how to debug this situation and find out that the reason for the false-negative is that CodeQL by default does not track taint from a globally tainted object to the read results of any of its fields. That is, if a.foo gets tainted (for example, on an assignment), then reading a.foo will pass the taint around. However if the whole object a is tainted, reading a.foo field will not pass the taint since this would create a lot of false positives.

However, in the case of a JAXRS endpoint parameter, we’re tainting the whole object, and for that particular type and all the types reachable from its object graph, we should allow this additional taint step.

Pavel solves this blocker in two different steps. First, he writes a recursive predicate that will return any type reachable from any remote flow source type [4]. This predicate can be improved by limiting the root types to only those deserialized from the HTTP request, such as JAXRS endpoint parameters or Spring controller method parameters, but excluding types such as HttpServletRequest. Then he writes an additional taint step. This is how we can tell CodeQL to connect disconnected nodes in the dataflow graph so that the taint flows through them. There are two ways of doing this. He could have overridden the Taint Tracking configuration isAdditionalTaintStep predicate to enable that additional taint step only for that specific configuration. Or, as he did in [5], extend TaintTracking::AdditionalTaintStep so that this additional taint step will be used by any Taint Tracking configuration.

As a final step, Pavel selects all the connected sources, sinks and path nodes in between [6] which returned 6 paths from our SamplerResource to a Rhino API:

dataflow results

Are there any variants that are not controlled by the checkState control?

Are these real injection paths? Well, we forgot one important thing. If the javascript evaluation takes place after the Javascript configuration is checked then it would not be a problem, given that there was no way of bypassing this control. In other words, if the Javascript evaluation takes place in the control graph basic block which is dominated by the basic block containing the Preconditions.checkState() check, we shouldn’t report the issue. However, our investigation now requires considering both data flow and control flow analysis. Can we support this with CodeQL? We sure can!

// Model `Preconditions.checkState(config.isEnabled(), "JavaScript is disabled")`
Override isSanitizer(DataFlow::Node node) {
  exists(MethodAccess checkState |
    checkState.getMethod().getName() = "checkState" and
    checkState.getArgument(0).(MethodAccess).getMethod().getName() = "isEnabled" and
    checkState.getBasicBlock().bbDominates(node.asExpr().getBasicBlock())
  )
}

We don’t find any paths from untrusted data into a Rhino API, which was not dominated by the Javascript Configuration check, when re-running the scan with the new sanitizer and removing the source constraint for SamplerResource. Therefore if this control is not bypassable (as is the case in the latest Apache Druid version) there shouldn’t be any additional Rhino injection vulnerabilities.

Are there other interesting @JacksonInject annotated types we can override?

To answer this question we need to enumerate all the @JacksonInject annotated types which belong to an HTTP request-bound type object graph. For example, we saw that our entry point was binding the HTTP request body into an instance of any class implementing the Sampler interface. We need to iterate through all those types, and for each one we’ll need to apply a similar analysis for all their field types recursively.

We already defined a predicate which would return all these types:

RefType jacksonDeserialisedType() {
  result = any(RemoteFlowSource s).getType() or
  result = jacksonDeserialisedType().getASubType() or
  result = jacksonDeserialisedType().getAField().getType()
}

All we need to do is constrain those types to those containing a field or constructor parameter annotated with @JacksonInject:

...
from Variable p
where 
p.getAnAnnotation().getType().getName()  = "JacksonInject" and
p.getType() = jacksonDeserialisedType()
select p.getType(), p

Voila! The query returns 263 variables (fields or parameters), which we could potentially control and override with user-controllable properties. However, it isn’t clear how we can control these variables. Are they direct fields of the root object graph type or maybe nested properties located deep down the object graph? It would be really nice if we could represent the results as a graph where the start node would be the root object graph type (the type of the JAXRS method parameter) and each edge will lead us one level down the object graph till we reach our @JacksonInject annotated property.

CodeQL allows us to perform this analysis and represent the results by defining an explicit edges query predicate, in this case connecting a type with any sub-type or field’s type and conforming our select statement according to the graph convention: Main Node, Start Node, End Node, Message

...
query predicate edges(RefType a, RefType b) {
  b = a.getASubtype() or
  b = a.getAField().getType()
} 

from Variable p
where 
p.getAnAnnotation().getType().getName()  = "JacksonInject" and
p.getType() = jacksonDeserialisedType()
select p, any(RemoteFlowSource s).getType(), p.getType(), "Jackson-injectable remote-reachable"

We can now see all the @JacksonInject annotated properties and how to reach them in the HTTP request JSON body!

Wrap up

This was a great opportunity to learn about CodeQL’s potential for supporting security research and variant analysis. If you want to learn more tricks from Pavel, such as printing the AST graph or how to quick-evaluate any part of a predicate, we recommend you watch the entire episode here!

Stay tuned for Episode 3!