CodeQL and Chill - The Java Edition

Looking for a vulnerability hunting challenge? Then this Java CTF challenge is for you! You will hone your bug finding skills and also learn all about CodeQL's taint tracking features.

Your mission, should you choose to accept it, is to hunt for a recently identified vulnerability in a popular container management platform. This vulnerability enabled attackers to inject arbitrary Java EL expressions, leading to a pre-auth Remote Code Execution (RCE) vulnerability.

Using CodeQL to track tainted data from a user-controlled bean property to a custom error message, you'll learn to fill in any gaps in the taint tracking to carve a full data flow path to the vulnerability.

Pre-requisite

To complete this challenge, participants must have some prior knowledge of CodeQL and its libraries for data flow analysis.

Capture the Flag challenge using CodeQL for JavaScript, or the CodeQL training examples for Java. Then come back here and take your chances with this CTF.

How to submit?

Create a secret GitHub gist or a private GitHub repository.
Submit your write-up, in your gist or in the README.md of your repo, or in another file of your repo.
You can add the responses either directly in the main write-up, or in separate files, that you reference in your main write-up.
When you are ready to submit, just email ctf@github.com with the link to your gist or to your repo (If you are using a private repo, first invite the user securitylab-ctf as a collaborator).

Need help?

Your first stop should be the documentation. If you need more help on some CodeQL concepts, visit our forum. Be careful, don't give away your solutions to other competitors ;-)
You can contact us at ctf@github.com
You can also join our Slack workspace.

Introduction

Many applications, including some container management platforms, use Java Bean Validation to validate that Java bean objects in the application satisfy certain constraints set by the developers, and it renders custom error messages if those constraints are not satisfied. However, in some cases the beans are unmarshalled from user-controlled data, such as HTTP requests. If the beans' properties are not properly sanitized before being used in the custom error message, an attacker can inject arbitrary Java EL expressions into the bean properties, which will lead to Remote Code Execution when the error message is rendered. Read more about the issue in our advisory.

In this challenge, you’ll use CodeQL to track the flow of tainted data from user-controlled bean properties to custom error messages, and identify the known injection vulnerabilities. You will also learn how to customize the CodeQL data flow analysis for Java, to help you explore source code better and find a wider range of related vulnerabilities.

Challenge problem

The platform uses Java Bean Validation (JSR 380) custom constraint validators such as com.netflix.titus.api.jobmanager.model.job.sanitizer.SchedulingConstraintSetValidator.

Read the Bean Validation 2.0 spec to learn about custom validators and what type of interpolations take place when custom constraint error messages are rendered. When building custom constraint violation error messages, it is important to understand that they support different types of interpolation, including Java EL expressions. Therefore if an attacker can inject arbitrary data in the error message template being passed to ConstraintValidatorContext.buildConstraintViolationWithTemplate()'s first argument, they will be able to run arbitrary Java code. These error message templates are sinks for injection vulnerabilities.

Unfortunately, it is common that validated bean properties flow into the custom error message. These must be validated against the constraints as they are likely to be sources of user-controlled data.

As an example, consider this code in SchedulingConstraintSetValidator.java . Here container is an object that is being validated (and so is likely to be untrusted), but its properties end up in the set common, which is used to create an error message template without sanitization.

@Override
public boolean isValid(Container container, ConstraintValidatorContext context) {
    if (container == null) {
        return true;
    }
    Set<String> common = new HashSet<>(container.getSoftConstraints().keySet());
    common.retainAll(container.getHardConstraints().keySet());
    if (common.isEmpty()) {
        return true;
    }
    context.buildConstraintViolationWithTemplate(
            "Soft and hard constraints not unique. Shared constraints: " + common
    ).addConstraintViolation().disableDefaultConstraintViolation();
    return false;
}

In this challenge, we want to identify occurrences of this pattern where user-controlled objects, potentially containing Java EL expressions, will flow into error messages.

Setup instructions

Install the Visual Studio Code IDE.
Go to the CodeQL starter workspace repository, and follow the instructions in that repository's README. When you are done, you should have the CodeQL extension installed and the vscode-codeql-starter workspace open in Visual Studio Code.
Download and unzip this CodeQL database, which corresponds to unpatched revision 8a8bd4c.
Import the database into Visual Studio Code (see documentation).

Step 1: Data flow and taint tracking analysis

As explained in the introduction, reporting these issues will involve tracking the flow of tainted data through the application. In this step we will prepare the basic building blocks for a CodeQL taint tracking analysis: sources and sinks.

Step 1.1: Sources

The sources of tainted data are the bean properties that go through constraint validation. In the code, these can be found as the first parameter of ConstraintValidator.isValid(...).

Write a CodeQL predicate that identifies these call arguments:

predicate isSource(DataFlow::Node source) { /* TODO describe source */ }

To test your predicate, use the Quick Evaluation command (Right-click > CodeQL: Quick Evaluation). You should get 6 results.

Hints:

Make sure you catch only the implementations of methods defined in the ConstraintValidator interface. For example this case should not be considered as a source.
There is a convenient class RemoteFlowSource that tells you when a particular data flow node is obtained from remote user input.
Pay attention to get only results that pertain to the project source code.

Bonus

This optional improvement will be taken into account only for distinguishing submissions that are very close in quality and completeness.

You will notice that this implementation will mark every validated bean property as a source of taint. But we want to get only the user-controlled sources. We are not interested in bean properties that an attacker cannot control, such as when the bean comes from unmarshaling an application configuration file.

Consider improving your predicate so we only consider cases where the bean type is bound to user-controllable data such as a JAX-RS endpoint.

Step 1.2: Sink

The injection sinks we are considering occur as the first argument of a call to ConstraintValidatorContext.buildConstraintViolationWithTemplate(...).

Write a CodeQL predicate to identify these sinks.

predicate isSink(DataFlow::Node sink) { /* TODO describe sink */ }

Quick evaluation of your predicate should give you 5 results.

Step 1.3: TaintTracking configuration

Before going any further, we recommend that you quick-eval your isSource and isSink predicates to make sure both are matching the issue described above in SchedulingConstraintSetValidator.java. This case will be our main target issue for the rest of this challenge.

All done? Ok, so now let's find this vulnerable path by tracking the tainted data!

You'll need to create a taint tracking configuration as explained in the CodeQL documentation. Fill in the template below with your definitions of isSource and isSink, and a nicer name. The predicate hasFlowPath will hold for any path through which data can flow from your sources to your sinks. As you checked that your predicates give you the correct sources and sinks, we'll get our vulnerability.

/** @kind path-problem */
import java
import semmle.code.java.dataflow.TaintTracking
import DataFlow::PathGraph

class MyTaintTrackingConfig extends TaintTracking::Configuration {
    MyTaintTrackingConfig() { this = "MyTaintTrackingConfig" }

    override predicate isSource(DataFlow::Node source) {
        // TODO 
    }

    override predicate isSink(DataFlow::Node sink) {
        // TODO 
    }
}

from MyTaintTrackingConfig cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink, source, sink, "Custom constraint error message contains unsanitized user data"

Run your query using the command CodeQL: Run Query (either in the Command Palette or the right-click menu). It should give you ... 0 results! Ok, this is disappointing! But don't give up just now.

Step 1.4: Partial Flow to the rescue

When developing taint tracking queries, we may find ourselves in this situation very often. Why aren't we getting a hit?

We identified the source and the sink, so this suggests that our analysis is missing a step along the path from the source to the sink.

CodeQL's Java libraries can help us find the missing gaps with the partial data flow debugging mechanism. This feature allows you to look for flows from a given source to any possible sink, leaving the sink unconstrained while limiting the number of steps away from the source to search for. So you can use this feature to track the flow of tainted data from your source to all possible sinks, and see where the flow stops being tracked further.

Create a debugging query that uses the hasPartialFlow predicate. You can use the template below.

/** @kind path-problem */
import java
import semmle.code.java.dataflow.TaintTracking
import DataFlow::PartialPathGraph // this is different!

class MyTaintTrackingConfig extends TaintTracking::Configuration {
    MyTaintTrackingConfig() { ... } // same as before
    override predicate isSource(DataFlow::Node source) { ... } // same as before
    override predicate isSink(DataFlow::Node sink) { ... } // same as before
    override int explorationLimit() { result =  10} // this is different!
}
from MyTaintTrackingConfig cfg, DataFlow::PartialPathNode source, DataFlow::PartialPathNode sink
where
  cfg.hasPartialFlow(source, sink, _) and
  source.getNode() = ... // TODO restrict to the one source we are interested in, for ease of debugging
select sink, source, sink, "Partial flow from unsanitized user data"

predicate partial_flow(PartialPathNode n, Node src, int dist) {
  exists(MyTaintTrackingConfig conf, PartialPathNode source |
    conf.hasPartialFlow(source, n, dist) and
    src = source.getNode() and
    source =  // TODO - restrict to THE source we are interested in
  )
}

Run your modified query to explore the flow of data and detect where the path stops.

Tips:

Still don't get any results? Make sure to read the documentation of the hasPartialFlow predicate carefully and adapt your taint tracking configuration accordingly.
You can add other variables that you'd need for your analysis using the exists keyword, or as parameters of your from clause. For example, you could restrict to only source and sink nodes in a particular enclosing function or file location.This will help you quickly filter your results to only the ones your interested in.

Step 1.5: Identifying a missing taint step

You must have found that CodeQL does not propagate taint through getters like container.getHardConstraints and container.getSoftConstraints. Can you guess why this default behaviour was implemented?

Step 1.6: Adding additional taint steps

Now you know that some taint steps are missing. This is because the analysis is careful by default, and tries not to give you extra flow that may lead to false positives. Now you need to tell your taint tracking configuration that tainted data can be propagated by certain code patterns.

CodeQL allows you to declare additional taint steps in a specific taint tracking configuration, as shown in this example.

However, we'll use an even more general approach, which allows us to add taint steps globally, so that they can be picked up by several taint tracking configurations (and potentially reused in many queries). For this you just have to extend the class TaintTracking::AdditionalTaintStep and implement the step predicate. The step predicate should hold true when tainted data flows from node1 to node2.

Run your original query again after adding a taint step. Did you get the expected results? Still no.

Re-run your partial flow query again, to find where you lost track of your tainted data this time.

Hints: In the step predicate you should indicate that the 2 nodes are 2 elements of a MethodAccess: one will be its qualifier and one will be the return value found at the call site.

Step 1.7: Adding taint steps through a constructor

Repeat the process above with all the methods that interrupt the taint tracking, until your partial flow predicate takes you finally to the call to constructor HashSet.

Now you observe that CodeQL does not propagate through the HashSet constructor. Write an additional taint step for this and re-run your query.

Step 1.8: Finish line for our first issue

Repeat the process above by adding more additional taint steps as needed, until your tainted data flows to the argument of the call to buildConstraintViolationWithTemplate. Run your query.

Hurray! The issue should be reported now!

Step 2: Second Issue

There is a similar issue in SchedulingConstraintValidator.java. Following the same process as above, find out why it is not reported by your query, and write the necessary taint steps to report it.

Tip: We don't like duplicate code. ;-)

Step 3: Errors and Exceptions

Since this sink is associated with generating error messages, there are many cases where they will be generated from an exception message in flows such as:

try {
    parse(tainted);
} catch (Exception e) {
    sink(e.getMessage())
}

Our current query does not cover this case. An accurate taint step would require analyzing the implementation of the throwing method to determine if the tainted input is actually reflected in the exception message.

Unfortunately, our CodeQL database identifies calls to library methods and their signatures, but does not have the source code of the implementations of those methods. So we need to model what they do. Write an additional taint step for these cases.

Note: In order to test your additional taint step, use quick evaluation on its step predicate to check if it detects the above pattern as you expect. Our current codebase doesn't contain cases of user-controlled beans flowing to these exception message pattern, so you won't be able to test your new taint step by running the whole query. Your query should continue to find only the same 2 results.

Hints:

Read the documentation for the classes TryStmt and CatchClause in the CodeQL Java library. Use jump-to-definition or hovers in the IDE to see their definition.
You will have to restrict to CatchClauses that write an exception by calling specific methods.
Use a heuristic to decide which of those methods write error messages.

Step 4: Exploit and remediation

Step 4.1: PoC

Write a working PoC for it. You can use the official Docker images.

Step 4.2: Remediation

Download a database of the patched code, import it into VS Code, and run your query to verify that it no longer reports the issue.

Our advisory contains other remediation techniques. Modify your query so it can be more precise or catch more variants of the vulnerability. For example, consider handling cases that disable the Java EL interpolation and only use ParameterMessageInterpolator.