“Exploits are really the closest thing to magic spells we have in this world.”
Halvar Flake, keynote presentation, OffensiveCon 2020.
I assume Halvar Flake was talking about other people’s exploits. As a general rule, you would expect the author of an exploit to understand how it works, even if it might seem like magic to everybody else. Well, not this time. This is the story of how I successfully exploited CVE-2021-3939 in Ubuntu’s accountsservice, then spent the next two weeks trying to figure out how my own exploit worked. It seemed like magic, even to me!
A double-free bug in Ubuntu’s accountsservice (CVE-2021-3939)
I discovered this bug while I was preparing for my presentation at Black Hat EU 2021. I wanted to include a demo of an exploit that I wrote about in a blog post last year. As I was testing the demo, I noticed that accountsservice was sometimes crashing for the wrong reason. That is, it was crashing due to a different bug than the one that I had deliberately reinserted for the purposes of the demo. I soon discovered that the same reproduction steps also worked on a fully patched version of accountsservice.
The bug turned out to be an incorrect call to user_get_fallback_value
:
static gchar *
user_get_fallback_value (User *user,
const gchar *property)
{
static gchar *system_language;
static gchar *system_formats_locale; <===== ONLY ALLOCATED ONCE
if (g_strcmp0 (property, "Language") == 0 && system_language)
return system_language;
if (g_strcmp0 (property, "FormatsLocale") == 0 && system_formats_locale)
return system_formats_locale; <===== RETURNED TO CALLER
...
Notice that system_formats_locale
is a static variable. It is allocated once on the first call, and then the same pointer is returned on subsequent calls. Therefore, the caller should not free the pointer. Unfortunately, that’s exactly what happens in user_change_language_authorized_cb
:
if (!is_in_pam_environment (user, "FormatsLocale")) {
/* set the user formats (certain LC_* variables) explicitly
in order to prevent surprises when LANG is changed */
g_autofree gchar *fallback_locale = user_get_fallback_value (user, "FormatsLocale"); <===== NO ALLOC
g_autofree gchar *validated_locale = user_locale_validate (user, fallback_locale, context);
gchar *formats_locale = user_update_environment (user,
validated_locale,
"save-to-pam-env",
context);
if (formats_locale != NULL)
accounts_user_set_formats_locale (ACCOUNTS_USER (user), formats_locale);
<===== fallback_locale AUTOMATICALLY FREED HERE
}
Due to the g_autofree
annotation on fallback_locale
, the memory is automatically freed on exit, leaving a dangling pointer in the static variable in user_get_fallback_value
. Under normal usage, the bug is not triggered because the code finds a value for FormatsLocale
in the user’s ~/.pam_environment
file. But an unprivileged user can easily trigger the bug as follows:
rm -f ~/.pam_environment
dbus-send --system --print-reply --dest=org.freedesktop.Accounts /org/freedesktop/Accounts/User1001 org.freedesktop.Accounts.User.SetLanguage string:hi
If you run those instructions a few times, accountsservice will crash due to a double-free error.
How to exploit a double-free bug
The basic concept is to convert the double-free vulnerability into a use-after-free bug, as shown in this diagram:
These are the steps:
- A chunk of memory is allocated (and stored in
system_formats_locale
). - The bug is triggered and the chunk is freed (leaving a dangling pointer in
system_formats_locale
). -
Memory is allocated in some other part of the code and gets a pointer to the same chunk that already belongs to
system_formats_locale
.At this point, two “owners” both believe that they own the same chunk of memory. In some exploitation scenarios, this may already be sufficient to exploit the bug: if one of the owners changes the contents of the chunk, then the other owner might be tricked into doing the wrong thing. However, in the specific case of CVE-2021-3939, it isn’t enough because
system_formats_locale
is a read-only pointer that isn’t used for anything particularly interesting. But that’s ok, because I can trigger the bug twice: - The bug is triggered again and the chunk is freed a second time.
- Another part of code allocates some memory and also gets a pointer to the same chunk.
Now there are three separate “owners” who all think they own the same chunk of memory. If “user 1” overwrites the chunk, then “user 2” might do the wrong thing, or vice versa.
Searching for primitives
When I first started working on an exploit for this vulnerability, it felt like a long shot. I estimated my chances of success at less than 25% when I discussed it with my colleagues. The main difficulty is that the bug only affects a single 0x20-sized memory chunk.
The chunk is allocated shortly after the process starts, before I have any chance of influencing its placement, and its address is fixed from that moment onwards. It’s marooned amongst long-lived chunks, so it also cannot change size by getting consolidated with an adjacent chunk.
Memory corruption vulnerabilities like this one are often difficult to exploit due to mitigations such as address space layout randomization (ASLR). Successful exploitation usually depends on being able to find other “primitives.” For example, to defeat ASLR, you typically need an infoleak primitive so that you can deduce the ASLR offsets, enabling you to forge pointers. So my hopes were raised when I found an infoleak. After triggering the bug, I could read the contents of the memory chunk due to this code in user_new
:
accounts_user_set_formats_locale (ACCOUNTS_USER (user), user_get_fallback_value (user, "FormatsLocale"));
user_new
is called immediately at the start of the process for the user accounts that are classified as “human users,” but only on-demand for system accounts. For example, the root user is not loaded by default, but is loaded on-demand if I send this command:
dbus-send --system --dest=org.freedesktop.Accounts --type=method_call --print-reply /org/freedesktop/Accounts org.freedesktop.Accounts.FindUserById int64:0
That caches the current contents of the vulnerable chunk, which I can now read like this:
dbus-send --system --dest=org.freedesktop.Accounts --type=method_call --print-reply /org/freedesktop/Accounts/User0 org.freedesktop.DBus.Properties.Get string:"org.freedesktop.Accounts.User" string:"FormatsLocale"
Sometimes this enables me to leak an address, but only when the bytes of the address form a valid UTF-8 string, so it only works occasionally. In the end, though, I decided that this infoleak was unlikely to be useful. The problem is that, even if I know the ASLR offsets, my ability to overwrite the memory is very limited because I can only mess with a single 0x20-sized chunk. Furthermore, the UTF-8 restriction also limits my ability to send a forged pointer back.
The infoleak did help me to learn one interesting fact about the behavior of the bug, though. Very often, after triggering the bug, the vulnerable chunk contains a string like “Session” or “Icon.” Those strings come from a function named user_save_to_keyfile
, which is (indirectly) called immediately after the bug is triggered. I can’t do anything useful with the chunk when it gets allocated in user_save_to_keyfile
. Unfortunately, there is no way to avoid that function getting called.
Another useful “primitive” is the ability to control the memory layout. Successful exploits often include a “heap grooming” phase in which the attacker fills the heap with repetitive data. Even if you don’t know the ASLR base offsets, heap grooming can often help to make the relative offsets of your heap objects predictable, thereby enabling you to reliably exploit something like a buffer overflow. My attempts to improve the predictability of the memory allocations in accountsservice were completely unsuccessful. I was trying to control which allocation would land on the vulnerable chunk, but no matter what I did, it continued to seem completely random. I even found a memory leak1, which I thought might help to deplete the allocator’s various caches and make the subsequent allocations more predictable, but it made no observable difference as far as I could tell.
Embrace the chaos
Randomness is a popular defense against exploitation. For example, ASLR adds a random offset to the memory addresses in an attempt to make them harder to predict. As a defense strategy, it’s often a lot less effective than people think it’s going to be.
After failing to find a way to control the non-determinism in accountsservice, I decided to take the opposite approach and embrace the chaos. I even deliberately added more randomness. As I mentioned earlier, the chunk is often captured by user_save_to_keyfile
, which I don’t want. But every call to user_save_to_keyfile
frees and reallocates all the keyfile data, causing the memory to get jumbled up. It’s easy to trigger that by changing your own email address, which any unprivileged user can do:
dbus-send --system --dest=org.freedesktop.Accounts --type=method_call --print-reply /org/freedesktop/Accounts/User1001 org.freedesktop.Accounts.User.SetEmail string:'kev@example.com'
So my exploit calls SetEmail a random number of times, in the hope that it will shake the chunk loose and make it available for allocation by a more interesting target. It won’t work every time, but that’s ok because I can keep trying the exploit until it’s successful. The double-free enables me to crash and restart accountsservice as many times as I like. The only restriction is that the crashes are rate-limited by systemd: I cannot restart accountsservice more than 5 times every 10 seconds.
Choosing a target
To successfully exploit the vulnerability, I need to find an 0x20-sized memory allocation that will cause something interesting to happen if the chunk gets overwritten. 0x20 is quite small, so the most obvious candidate would be a short string. For example, it would be awesome if I could change my username to “root” and then change “my” password. Unfortunately, that isn’t possible because all of the “human users” are loaded as soon as the process starts, so I am unable to interfere with any of the memory allocation related to my own account. I could potentially mess with the cached data for a system account because those are loaded on demand, but as an unprivileged user I am only permitted to call D-Bus methods on my own account, so that wouldn’t help much.
In the presentation that I gave at Black Hat EU, I talked about the importance of unique bus names in the D-Bus messaging system. When a process connects to the message bus, it gets assigned a unique name, like :1.3591
. Unique bus names are used for checking credentials, so if you are able to forge your own bus name then you could bypass security checks by pretending to be a privileged process. I think it might be possible to use the double-free vulnerability to overwrite my bus name, but it wouldn’t be easy because the bus names are allocated on a different thread than the thread with the vulnerability. Each thread has its own malloc arena, so I would first need to find a way to transfer the vulnerable chunk into the other thread’s arena (by allocating it in one thread and freeing it in another). I think it’s possible for that to happen, but it would be very difficult to control, so I decided not to focus on that exploitation strategy.
My flawed exploit plan
Instead, I decided to focus on CheckAuthData
, which is allocated in daemon_local_check_auth
. When you ask accountsservice to do something, like changing your email address or your password, it sends a D-Bus message to polkit to check whether you’re authorized. Some requests, such as changing your own email address, are instantly approved by polkit, but others, such as creating a new user account, require authorization from an admin user. CheckAuthData
is used to store a closure that will be called after the polkit request is approved. The D-Bus method call to polkit is asynchronous, so that gives me an opportunity to trigger the double-free bug between when accountsservice sends the message and when it receives the reply. This was my exploit plan:
- Change my own email address. This request will be approved by polkit.
- Trigger the double-free bug.
- Attempt to change the root user’s password. This should require admin privileges.
- Hope that both
CheckAuthData
allocations (in steps 1 and 3) land on the same memory chunk. - The change-email request is approved, but the
CheckAuthData
from step 1 has been overwritten by theCheckAuthData
from step 3, so the root user’s password is changed instead.
This plan had just one flaw, which is that CheckAuthData is an 0x40-sized allocation, so it cannot use the vulnerable chunk.
I don’t want to spend too much time dwelling on how I attempted to work around the 0x20 versus 0x40 chunk size mismatch. Let’s just say that my schemes were quite elaborate and also completely wrong. But then something unexpected happened. I had written the latest iteration of my exploit and left it running while I walked away from the computer in frustration. About half an hour later, still away from the computer, I realized that my latest scheme had a fundamental flaw and couldn’t possibly work. It was several hours before I returned to the computer to resume bashing my head against the wall. That’s when I discovered that my exploit had worked! But how? I was quite sure that my exploit design was wrong, so it must be working for a different, unplanned reason. It felt like magic!
Identifying the true exploit
Tracking down how my exploit actually worked wasn’t easy. The exploit typically takes several hours to succeed, involving thousands of accountsservice restarts, so I couldn’t just attach gdb and step through the sequence of events. I tried using rr, but it crashes on accountsservice. I also tried debugging with printf
’s, but when I added too many it interfered with the timing and the exploit stopped working. As it turns out, I was also printing the wrong information. In the end, the way that I figured it out was by inserting a long sleep
in user_change_password_authorized_cb
, which is only called when the exploit is successful. That enabled me to attach gdb after the exploit was successful and look at the call stack. It still left me guessing what had happened prior to that moment, but it gave me enough information to know what to focus on. This is the important bit:
#3 0x0000564cfbadff72 in user_change_password_authorized_cb at ../src/user.c:1920
#4 0x0000564cfbad5f75 in check_auth_cb at ../src/daemon.c:1427
#5 0x00007f2620ed2fe2 in g_simple_async_result_complete at ../../../gio/gsimpleasyncresult.c:802
#6 0x00007f2620c8bc8b in check_authorization_cb at /home/kev/projects/polkit/policykit-1-0.105/src/polkit/polkitauthority.c:835
The answer is in stack frame #6, which is in the polkit client-side library. It’s rather ironic: polkit also has a struct named CheckAuthData
, which essentially does exactly the same thing as the one in accountsservice. There are two closures in this call stack: one in the polkit library and one in accountsservice. But polkit’s CheckAuthData
is small enough to fit in an 0x20-sized chunk. So the exploit works almost exactly as I had originally intended, except by targeting the CheckAuthData
in polkit, rather than the one in accountsservice!
Simplifying the exploit
With my newfound understanding of how my exploit actually works, I have been able to simplify it. This is a rough description of how it works:
- Fork into two processes.
- First process:
- Triggers the bug approximately once a second.
- Second process:
- Sends flurries of alternating SetEmail and SetPassword messages.
I use two processes because I want the messages from the second process to arrive at approximately the same time as the bug is triggered, but I’m not sure of the timing. I want one of the SetEmail messages to arrive just before the bug is triggered and occupy the vulnerable chunk. Then I want one of the SetPassword messages to arrive just after the bug is triggered and overwrite the CheckAuthData
of the SetEmail message. The problem is that the bug is triggered during a polkit callback, so the timing depends on polkit, which is difficult to predict. That’s why I decided to take the easy route and just rely on non-determinism.
I have published the source code of the exploit in the securitylab repository.2
Conclusion
I think this exploit is an interesting example of exploiting a memory management bug purely through application logic, rather than, for example, by using any of the traditional Malloc Maleficarum techniques (that work by overwriting the heap metadata). An increasing number of mitigations have been added to the glibc malloc implementation to thwart those traditional techniques, but none of them are able to prevent a data-driven exploit like this one. Instead, the exploit uses its influence on the target process to carve an unintended, but feasible, logic path through the application to achieve its goals. This approach to memory management exploitation can provide many opportunities for attackers to take advantage of even the smallest issues in a targeted application, unhindered by any execution flow integrity mitigations.
Having said that, this is also the crudest exploit that I have written so far in my career as a security professional! It relies on chance and the fact that I can keep crashing accountsservice until it’s successful. But would an attacker care? It gets you a root shell, even if you have to wait a few hours. To me, it feels like magic that it’s even possible to exploit such a small bug, especially considering all the mitigations that have been added to make memory corruption vulnerabilities harder to exploit. Sometimes, all it takes to get root is a little wishful thinking!
Notes
-
I reported this bug at the same time as the double-free vulnerability and it has been fixed. Memory leaks (forgetting to free memory) are usually not vulnerabilities, but they can sometimes be useful tools when exploiting another bug. ↩
-
There are actually three versions of the exploit in the repository. The first version is the original poc that I sent to the Ubuntu security team. It’s a bit unreliable because it sometimes gets stuck waiting for a D-Bus reply that never arrives. The second version is essentially the same as the first, except it uses epoll to improve the reliability and avoid getting stuck. The third version is the simplified exploit. ↩