Deserialization of untrusted data can lead to vulnerabilities that allow an attacker to execute arbitrary code. Lately, there has been a growing realization in the Java community that deserialization methods need to be used with great care, see for example: What Do WebLogic, WebSphere, JBoss, Jenkins, OpenNMS, and Your Application Have in Common? This Vulnerability, or OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate Java Object Deserialization.
The readObject
method on java.io.ObjectInputStream
is one such vulnerable method. A typical use of readObject
looks like this:
ObjectInputStream ois = new ObjectInputStream(input);
MyObject obj = (MyObject)ois.readObject();
It will construct any sort of serializable object that can be found on the classpath before passing it back to the caller. If the constructed object happens to do anything dangerous during its construction or during its finalization, then it is too late to stop at the point when the cast checks the type of the returned object.
Using CodeQL to find unsafe deserialization
We can use CodeQL, the query technology of LGTM, to find such deserialization vulnerabilities. In order to do this we must find the places where deserialization happens, and furthermore we need to check that untrusted data can actually reach the deserialization call.
First, we can write a query to find calls to readObject
.
import java
from MethodAccess call, Method readobject
where
call.getMethod() = readobject and
readobject.hasName("readObject") and
readobject.getDeclaringType().hasQualifiedName("java.io", "ObjectInputStream")
select call
The query should be understood as follows. We are looking for a MethodAccess
, that is, a call to a method, where the called method has the name readObject
and is declared on the type java.io.ObjectInputStream
.
This is likely to return many results, including some that are actually safe, so we need to restrict ourselves to those calls that might read tainted data. To accomplish this we are going to use the dataflow library, which provides two useful things: a class, RemoteUserInput
, for the points where tainted data might enter the program, for example through the read of a http request parameter, and a member predicate, flowsTo
, which can tell us whether data from a given source can flow to a given sink.
First, we will refactor our query into a class definition to define the set of sinks that we are interested in, that is, the set of expressions that occur as qualifiers of readObject
calls, as this is where the potentially tainted data enters the readObject
method.
class UnsafeDeserializationSink extends Expr {
UnsafeDeserializationSink() {
exists(MethodAccess call, Method readobject |
call.getMethod() = readobject and
readobject.hasName("readObject") and
readobject.getDeclaringType().hasQualifiedName("java.io", "ObjectInputStream") and
this = call.getQualifier()
)
}
}
Now that we have defined our sink, we can write the complete query as source.flowsTo(sink)
where source
is a RemoteUserInput
and sink
is an UnsafeDeserializationSink
as we just defined it:
import java
import semmle.code.java.security.DataFlow
class UnsafeDeserializationSink extends Expr {
UnsafeDeserializationSink() {
exists(MethodAccess call, Method readobject |
call.getMethod() = readobject and
readobject.hasName("readObject") and
readobject.getDeclaringType().hasQualifiedName("java.io", "ObjectInputStream") and
this = call.getQualifier()
)
}
}
from RemoteUserInput source, UnsafeDeserializationSink sink
where source.flowsTo(sink)
select source, sink
Going further
So far the query above only looks for java.io.ObjectInputStream.readObject
, but there are other serialization frameworks that also have similar generic behavior, which means that they can also be the target of malicious deserialization. The full version of this query, which is included in the standard LGTM checks, also covers several other deserialization frameworks: see Deserialization of user-controlled data.
We have just scratched the surface of using CodeQL to track down security vulnerabilities, but even this simple example is quite useful, and the open-ended nature of CodeQL and its ease-of-use means that we can track down mostly anything that we are able to clearly define.
Note: Post originally published on LGTM.com on August 27, 2017