In this post I will look at two deserialization frameworks, Castor, which converts XML input into Java objects and Java objects into XML input, and Hessian, which is a binary protocol. A recent piece of research shows that deserializing untrusted data with either of these can lead to arbitrary code execution. While Hessian provides type filtering after version 4.0.51 to stop arbitrary types being deserialized, there is no obvious way to do so in Castor. In this post, I will explain how users can mitigate these issues.

Apache Camel vulnerabilities CVE-2017-12633 and CVE-2017-12634

I reported these two vulnerabilities in the Camel-castor and Camel-hessian components in Apache Camel. These can result in remote code being executed when untrusted data is received using these components. I would advise upgrading and following the instructions in the documentation to enable whitelisting.

Finding the vulnerabilities in Apache Camel

The idea here is very similar to what I did with Struts and Restlet. In both cases, there are interfaces that are responsible for processing and handling input data. In Struts, this is ContentTypeHandler and in Restlet, ConverterHelper. In Camel, the equivalent is the DataFormat interface, which has the unmarshal method to convert input data into Java objects:

Object unmarshal(Exchange exchange, InputStream stream) throws Exception;

As before, I model this method using a subclass of the Method class:

/** The method unmarshal of DataFormat. */
class Unmarshal extends Method {
  Unmarshal() {
    this.hasName("unmarshal") and this.getDeclaringType().hasQualifiedName("org.apache.camel.spi", "DataFormat")
  }
}

In the last few months, the dataflow library has undergone significant improvements, and it now has a new taint-tracking interface that allows much more flexibility. These enhancements mean that LGTM can now identify the Struts vulnerability with our standard query. In this post, I will make use of this new interface to construct the query.

CodeQL Taint Tracking Configuration

The new TaintTrackingConfiguration class allows more flexible configurations. To use this new feature, first derive from the TaintTracking::Configuration class, and give it a unique new name to identify it (here I used “camel”):

import semmle.code.java.dataflow.TaintTracking

class CamelCfg extends TaintTracking::Configuration {
  CamelCfg() {
    this = "camel"
  }
  override predicate isSource(DataFlow::Node source) {
  }
  
  override predicate isSink(DataFlow::Node sink) {
  }
  
  override predicate isSanitizer(DataFlow::Node node) {
  }
  
  override predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) {
  }
}

By overriding the predicates, I can easily add new sources, sinks, flow steps and sanitization steps to perform highly customized taint-tracking. As we already identified the untrusted data source here, let me begin with the isSource predicate. I want to treat an access to the second argument of unmarshal as an untrusted source:

override predicate isSource(DataFlow::Node source) {
  exists(Method m, Unmarshal unmarshal | m.overrides*(unmarshal) and
    source.asParameter() = m.getParameter(1)
  )
}

This is similar to what I’ve done in one of the Restlet posts.

Let’s now move on to the sinks. For Castor, the deserialization is performed in the unmarshal method of the Unmarshaller class:

InputStreamReader reader;
unmarshaller.unmarshal(reader);

This can be added as a sink by overriding the isSink predicate in TaintTrackingConfiguration:

override predicate isSink(DataFlow::Node sink) {
  exists(MethodAccess ma |
    ma.getMethod().hasName("unmarshal") and
    ma.getMethod().getDeclaringType().hasQualifiedName("org.exolab.castor.xml", "Unmarshaller") and
    sink.asExpr() = ma.getAnArgument()
  )
}

For Hessian, deserialization is done using the readObject method of the AbstractHessianInput class. The code pattern here is:

HessianInput input = new HessianInput(inputStream);
input.readObject(); //sink

This can be added as a sink in a similar way:

override predicate isSink(DataFlow::Node sink) {
  exists(MethodAccess ma |
    ma.getMethod().hasName("readObject") and
    ma.getMethod().getDeclaringType().hasQualifiedName("com.caucho.hessian.io","AbstractHessianInput") and
    sink.asExpr() = ma.getQualifier()
  )
}

For Castor, the unmarshal method takes InputStreamReader or InputSource as an argument and the construction of these classes from InputStream is already tracked in the standard CodeQL library. This means that the current TaintTrackingConfiguration will be able to find cases like this:

/** Override of the unmarshal method in `DataFormat` */
Object unmarshal(Exchange exchange, InputStream in) throws Exception {
  Unmarshaller unmarshaller;
  ...
  Reader reader = new InputStreamReader(in);
  unmarshaller.unmarshal(reader);
}

However, for Hessian, an InputStream flows into an AbstractHessianInput via a constructor (there are also methods to create them from the HessianFactory, but for simplicity, I will not include them here). As these constructor methods are specific to Hessian, they are not included in the standard CodeQL library. With TaintTrackingConfiguration, however, these can be added easily by overriding the isAdditionalTaintStep predicate:

override predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) {
  exists(ClassInstanceExpr ctor | 
    ctor.getAnArgument() = node1.asExpr() and //Source of this step is the argument
    ctor.getNumArgument() = 1 and
    (
      ctor.getConstructedType().(RefType).hasQualifiedName("com.caucho.io", "HessianInput") or
      ctor.getConstructedType().(RefType).hasQualifiedName("com.caucho.io", "Hessian2Input")
    ) and
    node2.asExpr() = ctor
  )
}

This tells TaintTrackingConfiguration that if I am passing an untrusted InputStream as an argument to the constructor of a HessianInput or Hessian2Input, then the resulting object is also untrusted. Running the query then identifies the unsafe deserializations using Castor and Hessian in Camel.

Mitigating the risk of Hessian and Castor

To mitigate the risk of remote code execution when using Hessian and Castor, some type checking should be done before deserializing the untrusted data. For Hessian, a type-filtering mechanism was introduced in version 4.0.51. Some new methods are introduced in the HessianFactory class to only allow whitelisted classes to be deserialized. To make use of this, construct an AbstractHessianInput from a HessianFactory that is safely configured:

HessianFactory factory = new HessianFactory();
factory.setWhitelist(true); //only deserialize classes in the whitelist
factory.allow("java.io.*"); //only allow classes in java.io
HessianInput input = factory.createHessianInput(untrustedStream);
input.readObject(); //safe

Castor, on the other hand, does not provide any obvious way to sanitize input data. However, davclaus has come up with a way to enable whitelisting in Castor by subclassing the DefaultObjectFactory. This involves overriding the various createInstance methods in the DefaultObjectFactory to insert type checking logic:

@Override
public Object createInstance(Class type) throws IllegalAccessException, InstantiationException {
  if (WHITELIST_PATTERN.matches(type.getName())) {
    return super.createInstance(type);
  } else {
    throw new IllegalAccessException("Not allowed to create class of type: " + type);
  }
}

This derived DefaultObjectFactory can then be used to configure Unmarshaller in Castor:

WhitelistObjectFactory factory = new WhitelistObjectFactory();
factory.setAllowClasses(allowedUnmarshallObjects);
factory.setDenyClasses(deniedUnmarshallObjects);
unmarshaller.setObjectFactory(factory);
unmarshaller.unmarshal(reader); //safe

Anyone who needs to use Castor on untrusted data should take a look at the implementation of WhitelistObjectFactory in Camel and use it in their own code.

Castor deserialization in SpringFramework

Let’s take a look at another example. In SpringMVC, there is also a class responsible for processing remote input, HttpMessageConverter. It has a read method that takes a HttpInputMessage that represents an input read from an HTTP request. I will use this as the source in my TaintTrackingConfiguration:

/** The method `read` of `HttpMessageConverter`. */
class Read extends Method {
  Read() {
    this.hasName("read") and
    this.getDeclaringType().getSourceDeclaration().hasQualifiedName("org.springframework.http.converter", "HttpMessageConverter")
  }
}

/** The class `HttpInputMessage` */
class HttpInputMessage extends RefType {
  HttpInputMessage() {
    this.hasQualifiedName("org.springframework.http", "HttpInputMessage")
  }
}

class SpringHttpMessageConverterConfig extends TaintTracking::Configuration {
  SpringHttpMessageConverterConfig() {
    this = "SpringHttpMessageConverterConfig"
  }
  
  override predicate isSource(DataFlow::Node source) {
    //The `HttpInputMessage` argument in a `HttpMessageConverter`.
    exists(Method m, Read r, Parameter p | m.overrides*(r) and
      p = m.getAParameter() and p.getType() instanceof HttpInputMessage and
      source.asParameter() = p
    )
  }
}

As I am tracking deserialization with Castor or Hessian, I will reuse the previous sink from the Camel configuration. Now the actual remote input in the HttpInputMessage is retrieved from the method getBody as an InputStream. This means that if an HttpInputMessage is tainted, then so is the output of its getBody method. I will add a taint tracking step in my configuration to accommodate this:

override predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) {
  // If an `HttpInputMessage` is tainted, then the result of `getBody` is also tainted.
  exists(MethodAccess ma | 
    ma.getMethod().hasName("getBody") and
    ma.getMethod().getDeclaringType() instanceof HttpInputMessage and
    node1.asExpr() = ma.getQualifier() and
    node2.asExpr() = ma
  )
}

Only one tracking step added, and I am ready to go! Putting these together, I ran a query and found a result in CastorMarshaller. It turns out that when MarshallingHttpMessageConverter is used with unmarshaller set to CastorMarshaller, the application is vulnerable to remote code execution. This is in fact readily exploitable, as all the classes that are required to perform arbitrary code execution are contained in the classpath of SpringMVC. I have reported this issue to Pivotal, and after some discussions with the developers, we decided that, as Castor itself has been dormant since mid 2016, and there didn’t seem to be many use cases of CastorMarshaller, it is best to deprecate the component and stop supporting it.

Mitigation for Spring

As CastorMarshaller is deprecated, I would suggest stop using it if possible. However, if you need to use it to receive untrusted data, then you can use a solution similar to the WhitelistFactory that davclaus came up with (either use that class or implement a similar one yourself) and set it in CastorMarshaller. This can be done, for example, using the XML configuration:

<bean id="marshallingHttpMessageConverter">
                class="org.springframework.http.converter.xml.MarshallingHttpMessageConverter">
        <property name="marshaller" ref="castorMarshaller"/>
        <property name="unmarshaller" ref="castorMarshaller"/>
</bean>

<bean id="castorMarshaller" class="org.springframework.oxm.castor.CastorMarshaller">
        <property name="objectFactory" ref="whiteListObjectFactory"/>
</bean>
<bean id="whiteListObjectFactory" class="my.package.WhiteListObjectFactory">
        <property name="allowClasses" value="my.package.*"/>
</bean>

This would then only allow whitelisted classes to be deserialized.

Conclusion

In this post I have demonstrated how to configure some new features of the dataflow library to discover unsafe deserialization issues using the Castor and Hessian frameworks. In particular, these improvements not only make it very easy to customize queries and carry out your own security analysis, they also allow LGTM to pick up serious vulnerabilities like the deserialization vulnerability I discovered in Struts out of the box.

Note: Post originally published on LGTM.com on January 17, 2018