In this post I will look at two deserialization frameworks, Castor, which converts XML input into Java objects and Java objects into XML input, and Hessian, which is a binary protocol. A recent piece of research shows that deserializing untrusted data with either of these can lead to arbitrary code execution. While Hessian provides type filtering after version 4.0.51 to stop arbitrary types being deserialized, there is no obvious way to do so in Castor. In this post, I will explain how users can mitigate these issues.
Apache Camel vulnerabilities CVE-2017-12633 and CVE-2017-12634
I reported these two vulnerabilities in the Camel-castor and Camel-hessian components in Apache Camel. These can result in remote code being executed when untrusted data is received using these components. I would advise upgrading and following the instructions in the documentation to enable whitelisting.
Finding the vulnerabilities in Apache Camel
The idea here is very similar to what I did with Struts and Restlet. In both cases, there are interfaces that are responsible for processing and handling input data. In Struts, this is ContentTypeHandler
and in Restlet, ConverterHelper
. In Camel, the equivalent is the DataFormat
interface, which has the unmarshal
method to convert input data into Java objects:
Object unmarshal(Exchange exchange, InputStream stream) throws Exception;
As before, I model this method using a subclass of the Method
class:
/** The method unmarshal of DataFormat. */
class Unmarshal extends Method {
Unmarshal() {
this.hasName("unmarshal") and this.getDeclaringType().hasQualifiedName("org.apache.camel.spi", "DataFormat")
}
}
In the last few months, the dataflow library has undergone significant improvements, and it now has a new taint-tracking interface that allows much more flexibility. These enhancements mean that LGTM can now identify the Struts vulnerability with our standard query. In this post, I will make use of this new interface to construct the query.
CodeQL Taint Tracking Configuration
The new TaintTrackingConfiguration
class allows more flexible configurations. To use this new feature, first derive from the TaintTracking::Configuration
class, and give it a unique new name to identify it (here I used “camel”):
import semmle.code.java.dataflow.TaintTracking
class CamelCfg extends TaintTracking::Configuration {
CamelCfg() {
this = "camel"
}
override predicate isSource(DataFlow::Node source) {
}
override predicate isSink(DataFlow::Node sink) {
}
override predicate isSanitizer(DataFlow::Node node) {
}
override predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) {
}
}
By overriding the predicates, I can easily add new sources, sinks, flow steps and sanitization steps to perform highly customized taint-tracking. As we already identified the untrusted data source here, let me begin with the isSource
predicate. I want to treat an access to the second argument of unmarshal
as an untrusted source:
override predicate isSource(DataFlow::Node source) {
exists(Method m, Unmarshal unmarshal | m.overrides*(unmarshal) and
source.asParameter() = m.getParameter(1)
)
}
This is similar to what I’ve done in one of the Restlet posts.
Let’s now move on to the sinks. For Castor, the deserialization is performed in the unmarshal
method of the Unmarshaller
class:
InputStreamReader reader;
unmarshaller.unmarshal(reader);
This can be added as a sink by overriding the isSink
predicate in TaintTrackingConfiguration
:
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess ma |
ma.getMethod().hasName("unmarshal") and
ma.getMethod().getDeclaringType().hasQualifiedName("org.exolab.castor.xml", "Unmarshaller") and
sink.asExpr() = ma.getAnArgument()
)
}
For Hessian, deserialization is done using the readObject
method of the AbstractHessianInput
class. The code pattern here is:
HessianInput input = new HessianInput(inputStream);
input.readObject(); //sink
This can be added as a sink in a similar way:
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess ma |
ma.getMethod().hasName("readObject") and
ma.getMethod().getDeclaringType().hasQualifiedName("com.caucho.hessian.io","AbstractHessianInput") and
sink.asExpr() = ma.getQualifier()
)
}
For Castor, the unmarshal
method takes InputStreamReader
or InputSource
as an argument and the construction of these classes from InputStream
is already tracked in the standard CodeQL library. This means that the current TaintTrackingConfiguration
will be able to find cases like this:
/** Override of the unmarshal method in `DataFormat` */
Object unmarshal(Exchange exchange, InputStream in) throws Exception {
Unmarshaller unmarshaller;
...
Reader reader = new InputStreamReader(in);
unmarshaller.unmarshal(reader);
}
However, for Hessian, an InputStream
flows into an AbstractHessianInput
via a constructor (there are also methods to create them from the HessianFactory
, but for simplicity, I will not include them here). As these constructor methods are specific to Hessian, they are not included in the standard CodeQL library. With TaintTrackingConfiguration
, however, these can be added easily by overriding the isAdditionalTaintStep
predicate:
override predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) {
exists(ClassInstanceExpr ctor |
ctor.getAnArgument() = node1.asExpr() and //Source of this step is the argument
ctor.getNumArgument() = 1 and
(
ctor.getConstructedType().(RefType).hasQualifiedName("com.caucho.io", "HessianInput") or
ctor.getConstructedType().(RefType).hasQualifiedName("com.caucho.io", "Hessian2Input")
) and
node2.asExpr() = ctor
)
}
This tells TaintTrackingConfiguration
that if I am passing an untrusted InputStream
as an argument to the constructor of a HessianInput
or Hessian2Input
, then the resulting object is also untrusted. Running the query then identifies the unsafe deserializations using Castor and Hessian in Camel.
Mitigating the risk of Hessian and Castor
To mitigate the risk of remote code execution when using Hessian and Castor, some type checking should be done before deserializing the untrusted data. For Hessian, a type-filtering mechanism was introduced in version 4.0.51. Some new methods are introduced in the HessianFactory
class to only allow whitelisted classes to be deserialized. To make use of this, construct an AbstractHessianInput
from a HessianFactory
that is safely configured:
HessianFactory factory = new HessianFactory();
factory.setWhitelist(true); //only deserialize classes in the whitelist
factory.allow("java.io.*"); //only allow classes in java.io
HessianInput input = factory.createHessianInput(untrustedStream);
input.readObject(); //safe
Castor, on the other hand, does not provide any obvious way to sanitize input data. However, davclaus has come up with a way to enable whitelisting in Castor by subclassing the DefaultObjectFactory
. This involves overriding the various createInstance
methods in the DefaultObjectFactory
to insert type checking logic:
@Override
public Object createInstance(Class type) throws IllegalAccessException, InstantiationException {
if (WHITELIST_PATTERN.matches(type.getName())) {
return super.createInstance(type);
} else {
throw new IllegalAccessException("Not allowed to create class of type: " + type);
}
}
This derived DefaultObjectFactory
can then be used to configure Unmarshaller
in Castor:
WhitelistObjectFactory factory = new WhitelistObjectFactory();
factory.setAllowClasses(allowedUnmarshallObjects);
factory.setDenyClasses(deniedUnmarshallObjects);
unmarshaller.setObjectFactory(factory);
unmarshaller.unmarshal(reader); //safe
Anyone who needs to use Castor on untrusted data should take a look at the implementation of WhitelistObjectFactory
in Camel and use it in their own code.
Castor deserialization in SpringFramework
Let’s take a look at another example. In SpringMVC, there is also a class responsible for processing remote input, HttpMessageConverter
. It has a read
method that takes a HttpInputMessage
that represents an input read from an HTTP request. I will use this as the source in my TaintTrackingConfiguration
:
/** The method `read` of `HttpMessageConverter`. */
class Read extends Method {
Read() {
this.hasName("read") and
this.getDeclaringType().getSourceDeclaration().hasQualifiedName("org.springframework.http.converter", "HttpMessageConverter")
}
}
/** The class `HttpInputMessage` */
class HttpInputMessage extends RefType {
HttpInputMessage() {
this.hasQualifiedName("org.springframework.http", "HttpInputMessage")
}
}
class SpringHttpMessageConverterConfig extends TaintTracking::Configuration {
SpringHttpMessageConverterConfig() {
this = "SpringHttpMessageConverterConfig"
}
override predicate isSource(DataFlow::Node source) {
//The `HttpInputMessage` argument in a `HttpMessageConverter`.
exists(Method m, Read r, Parameter p | m.overrides*(r) and
p = m.getAParameter() and p.getType() instanceof HttpInputMessage and
source.asParameter() = p
)
}
}
As I am tracking deserialization with Castor or Hessian, I will reuse the previous sink from the Camel configuration. Now the actual remote input in the HttpInputMessage
is retrieved from the method getBody
as an InputStream
. This means that if an HttpInputMessage
is tainted, then so is the output of its getBody
method. I will add a taint tracking step in my configuration to accommodate this:
override predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) {
// If an `HttpInputMessage` is tainted, then the result of `getBody` is also tainted.
exists(MethodAccess ma |
ma.getMethod().hasName("getBody") and
ma.getMethod().getDeclaringType() instanceof HttpInputMessage and
node1.asExpr() = ma.getQualifier() and
node2.asExpr() = ma
)
}
Only one tracking step added, and I am ready to go! Putting these together, I ran a query and found a result in CastorMarshaller
. It turns out that when MarshallingHttpMessageConverter
is used with unmarshaller
set to CastorMarshaller
, the application is vulnerable to remote code execution. This is in fact readily exploitable, as all the classes that are required to perform arbitrary code execution are contained in the classpath of SpringMVC. I have reported this issue to Pivotal, and after some discussions with the developers, we decided that, as Castor itself has been dormant since mid 2016, and there didn’t seem to be many use cases of CastorMarshaller
, it is best to deprecate the component and stop supporting it.
Mitigation for Spring
As CastorMarshaller
is deprecated, I would suggest stop using it if possible. However, if you need to use it to receive untrusted data, then you can use a solution similar to the WhitelistFactory
that davclaus came up with (either use that class or implement a similar one yourself) and set it in CastorMarshaller
. This can be done, for example, using the XML configuration:
<bean id="marshallingHttpMessageConverter">
class="org.springframework.http.converter.xml.MarshallingHttpMessageConverter">
<property name="marshaller" ref="castorMarshaller"/>
<property name="unmarshaller" ref="castorMarshaller"/>
</bean>
<bean id="castorMarshaller" class="org.springframework.oxm.castor.CastorMarshaller">
<property name="objectFactory" ref="whiteListObjectFactory"/>
</bean>
<bean id="whiteListObjectFactory" class="my.package.WhiteListObjectFactory">
<property name="allowClasses" value="my.package.*"/>
</bean>
This would then only allow whitelisted classes to be deserialized.
Conclusion
In this post I have demonstrated how to configure some new features of the dataflow library to discover unsafe deserialization issues using the Castor
and Hessian
frameworks. In particular, these improvements not only make it very easy to customize queries and carry out your own security analysis, they also allow LGTM to pick up serious vulnerabilities like the deserialization vulnerability I discovered in Struts out of the box.
Note: Post originally published on LGTM.com on January 17, 2018