We created this CTF challenge to help you quickly learn CodeQL. The objective is to find a critical buffer overflow bug in glibc using CodeQL, our simple, code query language. To capture the flag, you'll need to refine your query to increase its precision using this step by step guide.
Challenge Instructions
The goal of this challenge is to find unsafe uses of alloca in the GNU C Library (glibc).
alloca
is used to allocate a buffer on the stack. It is usually implemented by simply subtracting the size parameter from the stack pointer and returning the new value of the stack pointer. This means that it has two important benefits:
- The memory allocated by alloca is automatically freed when the current function returns.
- It is extremely fast.
But alloca
can also be unsafe because it does not check whether there is enough stack space left for the buffer. If the requested buffer size is too big, then alloca
might return an invalid pointer. This can
cause the application to crash with a SIGSEGV
when it attempts to read or write the buffer. Therefore alloca
is only intended to be used to allocate small buffers. It is the programmer's responsibility to check
that the size isn't too big.
The GNU C Library contains hundreds of calls to alloca
. In this challenge, you will use CodeQL to find those calls. Of course many of those calls are safe, so the main goal of the challenge is to
refine your query to reduce the number of false positives. If you follow the challenge all the way to the end then you might even find a bug in glibc that is reproducible from a standard command-line application.
Setup instructions
Instructions for installing CodeQL are included at the end of this document.
Documentation links
If you get stuck, try searching our documentation and blog posts for help and ideas. Below are a few links to help you get started:
- https://codeql.github.com/docs/
- Using CodeQL to find kernel stack buffer overflows in Qualcomm MSM-4.4
- Kernel RCE caused by buffer overflows in macOS NFS client
Challenge
The challenge is split into several steps, each of which contains multiple questions, however building one query per step is sufficient.
Step 0: finding the definition of alloca
- Question 0.0:
alloca
is a macro. Find the definition of this macro and the name of the function that it expands to.
Step 1: finding the calls to alloca and filtering out small allocation sizes
- Question 1.0: Find all the calls to
alloca
(using the function name that you found in step 0). -
Question 1.1: Use the
upperBound
andlowerBound
predicates from theSimpleRangeAnalysis
library to filter out results which are safe because the allocation size is small. You can classify the allocation size as small if it is less than65536
. But don't forget that negative sizes are very dangerous.
Step 2: filtering out calls that are guarded by __libc_use_alloca
The correct way to use alloca in glibc is to first check that the allocation is safe by calling __libc_use_alloca
. You can see a good example of this at
getopt.c:252
. That code uses __libc_use_alloca
to check if it is safe to use alloca
.
If not, it uses malloc
instead. In this step, you will identify calls to alloca
that are safe because they are guarded by a call to __libc_use_alloca
.
- Question 2.0: Find all calls to
__libc_use_alloca
. -
Question 2.1: Find all
guard conditions
where the condition is a call to
__libc_use_alloca
. -
Question 2.2: Sometimes the result of
__libc_use_alloca
is assigned to a variable, which is then used as the guard condition. For example, this happens atsetsourcefilter.c:38-41
. Enhance your query, using local dataflow, so that it also finds this guard condition. -
Question 2.3: Sometimes the call to
__libc_use_alloca
is wrapped in a call to__builtin_expect
. For example, this happens atsetenv.c:185
. Enhance your query so that it also finds this guard condition. -
Question 2.4: Sometimes the result of
__libc_use_alloca
is negated with the!
operator. For example, this happens atgetaddrinfo.c:2291-2293
. Enhance your query so that it can also handle negations. - Question 2.5: Find calls to
alloca
that are safe because they are guarded by a call to__libc_use_alloca
.
Step 3: combine steps 1 and 2 to filter out safe calls
- Question 3.0: use your answer from step 2 to enhance your query from step 1 by filtering out calls to
alloca
that are safe because they are guarded by a call to__libc_use_alloca
.
Step 4: taint tracking
In this step, you'll use a
taint tracking
query to find an unsafe call to alloca
where the allocation size is controlled by a value read from a file.
- Question 4.0: Find calls to
fopen
. (Be aware thatfopen
is another macro.) -
Question 4.0: Write a taint tracking query. The source should be a call to
fopen
and the sink should be the size argument of an unsafe call toalloca
. To help you get started, here is the boilerplate for the query:
/**
* @name 41_fopen_to_alloca_taint
* @description Track taint from fopen to alloca.
* @kind path-problem
* @problem.severity warning
* @id cpp/ctf/unsafe-alloca
*/
import cpp
import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis
import semmle.code.cpp.dataflow.new.TaintTracking
import semmle.code.cpp.models.interfaces.DataFlow
import semmle.code.cpp.controlflow.Guards
// Track taint through `__strnlen`.
class StrlenFunction extends DataFlowFunction {
StrlenFunction() { this.getName().matches("%str%len%") }
override predicate hasDataFlow(FunctionInput i, FunctionOutput o) {
i.isParameter(0) and o.isReturnValue()
}
}
// Track taint through `__getdelim`.
class GetDelimFunction extends DataFlowFunction {
GetDelimFunction() { this.getName().matches("%get%delim%") }
override predicate hasDataFlow(FunctionInput i, FunctionOutput o) {
i.isParameter(3) and o.isReturnValue()
}
}
module Config implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
// TODO
}
predicate isSink(DataFlow::Node sink) {
// TODO
}
}
module Flow = TaintTracking::Global<Config>;
import Flow::PathGraph
from Flow::PathNode source, Flow::PathNode sink
where Flow::flowPath(source, sink)
select sink, source, sink, "fopen flows to alloca"
Step 5: searching for a PoC (optional)
-
Question 5.0: The GNU C Library includes several command-line applications. (It contains 24 main functions.) Demonstrate that the bug is real by showing that you can trigger a
SIGSEGV
in one of these command-line applications.
Getting Help
If you find yourself stuck writing QL or on any part of the CTF and would like some help, drop us an email at ctf@github.com
Setup instructions for running CodeQL offline
We hope you enjoyed this challenge! If you are interested in continuing to use CodeQL for security research, then we recommend installing CodeQL on your own computer. This will enable you to run queries offline. We have also provided
these offline instructions for posterity, because the query results will change over time as the source code evolves. But the instructions below use a snapshot corresponding to revision
3332218
, which is the revision for which we designed this challenge.
To run CodeQL queries offline, follow these steps:
- Install the Visual Studio Code IDE.
- Download and install the Visual Studio Code extension.
-
Download a pre-existing vulnerable GNU CodeQL database or create one using the CodeQL CLI, which corresponds to revision
3332218
and import it into Visual Studio Code.
You can download other snapshots for offline use from LGTM. For example, you can download a snapshot for the latest revision of glibc here. Every project on LGTM has a download link for downloading the latest snapshot.