Originally published on 5 September 15:30 BST. Updated on 6 September: added a warning regarding multiple working exploits having been published by third parties. Included details of Struts version 2.3.34

In this post I’ll describe how I customized a standard LGTM query to find a remote code execution vulnerability in Apache Struts. It has been assigned CVE-2017-9805. A release announcement and security bulletin are available on the Apache Struts website. This vulnerability has been addressed in Struts versions 2.3.34 and 2.5.13. Due to the severe nature of this vulnerability, a couple of details (including a working exploit) have been omitted from this post; this information will be added in a few weeks’ time.

As of the early morning on 6 September 2017 (GMT), multiple working exploits have been observed on various places on the internet. We strongly advise users of Struts to upgrade to the latest version to mitigate this security risk.

The vulnerability I discovered is a result of unsafe deserialization in Java. Multiple similar vulnerabilities have come to light in recent years, after Chris Frohoff and Gabriel Lawrence discovered a deserialization flaw in Apache Commons Collections that can lead to arbitrary code execution. Many Java applications have since been affected by such vulnerabilities. If you’d like to know more about this type of vulnerability, the LGTM documentation page on this topic is a good place to start.

Detecting unsafe deserialization in Struts

LGTM identifies alerts in code using a code query technology called CodeQL, and queries written in a specially-designed language: QL. One of the many queries for Java detects potentially unsafe deserialization of user-controlled data. The query identifies situations in which unsanitized data is deserialized into a Java object. This includes data that comes from an HTTP request or from any other socket connection.

This query detects common ways through which user-controlled data flows to a deserialization method. However, some projects use a slightly different approach to receive remote user input. For example, Apache Struts uses the ContentTypeHandler interface. This converts data into Java objects. Since implementations of this interface usually deserialize the data passed to them, every class that implements this interface is potentially of interest. The standard query for detecting unsafe deserialization of user-controlled data can easily be adapted to recognize this additional method for processing user input. This is done by defining a custom data source.

In this case, we are interested in data flowing from the toObject method, which is defined in the ContentTypeHandler interface:

void toObject(Reader in, Object target);

The data contained in the first argument in that is passed to toObject should be considered tainted: it is under the control of a remote user and should not be trusted. We want to find places where this tainted data (the source) flows into a deserialization method (a sink) without input validation or sanitization.

The CodeQL DataFlow library provides functionality for tracking tainted data through various steps in the source code. This is known as taint tracking. For example, data gets tracked through various method calls:

IOUtils.copy(remoteUserInput, output);   // output is now also tainted because the function copy preserves the data.

To make use of the taint tracking functionality in the DataFlow library, let’s define the in argument to ContentTypeHandler.toObject(...) as a tainted source. First, we define how the query should recognize the ContentTypeHandler interface and the method toObject.

/** The ContentTypeHandler Java class in Struts **/
class ContentTypeHandler extends Interface {
  ContentTypeHandler() {
    this.hasQualifiedName("org.apache.struts2.rest.handler", "ContentTypeHandler")
  }
}

/** The method `toObject` */
class ToObjectDeserializer extends Method {
  ToObjectDeserializer() {
    this.getDeclaringType().getASupertype*() instanceof ContentTypeHandler and
    this.getSignature = "toObject(java.io.Reader,java.lang.Object)"
  }
}

Here we use getASupertype*() to restrict the matching to any class that has ContentTypeHandler as a supertype.

Next we want to mark the first argument of the toObject method as an untrusted data source, and track that data as it flows through the code paths. To do that, we extend the FlowSource class with CodeQL’s dataflow library:

/** Mark the first argument of `toObject` as a dataflow source **/
class ContentTypeHandlerInput extends FlowSource {
  ContentTypeHandlerInput() {
    exists(ToObjectDeserializer des |
      des.getParameter(0).getAnAccess() = this
    )
  }
}

Intuitively, this definition says that any access to the first parameter of a toObject method, as captured by ToObjectDeserializer above, is a flow source. Note that for technical reasons, flow sources have to be expressions. Therefore, we identify all accesses of that parameter (which are expressions) as sources, rather than the parameter itself (which isn’t).

Now that we have the definition for a dataflow source, we can look for places where this tainted data is used in an unsafe deserialization method. We don’t have to define that method (the sink) ourselves as it is already in the Deserialization of user-controlled data query (line 64: UnsafeDeserializationSink, we will need to copy its definition into the query console). Using this, our final query becomes:

from ContentTypeHandlerInput source, UnsafeDeserializationSink sink
where source.flowsTo(sink)
select source, sink

Here we use the .flowsTo predicate in FlowSource for tracking so that we only identify the cases when unsafe deserialization is performed on a ContentTypeHandlerInput source.

When I ran the customized query on Struts there was exactly one result (Running it now will yield no result as the fix has been applied). I verified that it was a genuine remote code execution vulnerability before reporting it to the Struts security team. They have been very quick and responsive in working out a solution even though it is a fairly non-trivial task that requires API changes. Due to the severity of this finding I will not disclose more details at this stage. Rather, I will update this blog post in a couple of weeks’ time with more information.

Vendor Response

Note: Post originally published on LGTM.com on September 05, 2017