December 17, 2019

Ubuntu apport TOCTOU vulnerability (CVE-2019-7307)

Kevin Backhouse

I started a four-part series about Ubuntu's crash reporting system. In this second post, I'll focus on apport CVE-2019-7307, a TOCTOU vulnerability that enables a local attacker to include the contents of any file on the system in a crash report.

The bug

Apport allows you to place a file in your home directory named ~/.apport-ignore.xml. It enables you to specify a custom list of executables that should be ignored by the crash reporter. But what happens if you replace ~/.apport-ignore.xml with a symlink to a file that you don't own, such as /etc/shadow? The code that handles that is at report.py, line 962:

if not os.access(ifpath, os.R_OK) or os.path.getsize(ifpath) == 0:
    # create a document from scratch
    dom = xml.dom.getDOMImplementation().createDocument(None, 'apport', None)
else:
    try:
        dom = xml.dom.minidom.parse(ifpath)
    except ExpatError as e:
        raise ValueError('%s has invalid format: %s' % (_ignore_file, str(e)))

As you can see, it uses os.access to check that the user has permission to access the file. If the permission check passes, then it calls xml.dom.minidom.parse to parse the XML. This is a classic example of a "time of check to time of use" (TOCTOU) vulnerability. If the file is valid at the time of the os.access check, but I quickly replace it with a symlink to a different file before the call to xml.dom.minidom.parse, then I can trick apport into using its elevated privileges to read a file which I do not have permission to access myself.

Subtleties of privilege dropping in apport

You may wonder why the os.access check would ever fail, because apport is a root process. The reason is that apport drops privileges during its execution in two stages. The first stage happens at apport, line 455:

# Partially drop privs to gain proper os.access() checks
drop_privileges(True)

The second stage happens at line 601:

# Totally drop privs before writing out the reportfile.
drop_privileges()

What do they mean by "partially drop privs" and "totally drop privs"? This is related to the real, effective, and saved user ids of the process:

RUIDEUIDSUID
root process000
"partially drop privs"100100
read files safely100110010
"totally drop privs"100110011001

(0 is the user id of root and 1001 is the user id of my own unprivileged account, kev.) The real user id (RUID) determines the owner of the process, but the effective user id (EUID) determines which files the process can read and write. This means that when apport is in the "partially drop privs" state, it can still read any file on the system.

The correct way for apport to make sure that it doesn't accidentally use its root privileges to read or write a file is to first enter the state that I have named "read files safely" in the table. Because the saved user id (SUID) is still root, the process can temporarily enter the "read files safely" state and then revert back to "partially drop privs" after it's done reading the file. Note that the transition to "totally drop privs" is, in contrast, irreversible.

The os.access check is unusual because it uses the RUID, rather than the EUID, to check whether the real user has permission to access the file. This is the reason why there is a TOCTOU vulnerability. Apport is in the "partially drop privs" state when os.access is called. This means it will reject files that I don't own, but if I can bypass the os.access check then the subsequent call to xml.dom.minidom.parse will be able to read any file because the EUID is still root. I can do this by timing the attack to replace ~/.apport-ignore.xml with a symlink just after the call to os.access.

Comparison to CVE-2019-11481

I found a very similar bug at fileutils.py, line 335:

def get_config(section, setting, default=None, path=None, bool=False):
    '''Return a setting from user configuration.

    This is read from ~/.config/apport/settings or path. If bool is True, the
    value is interpreted as a boolean.
    '''
    if not get_config.config:
        get_config.config = ConfigParser()
        if path:
            get_config.config.read(path)
        else:
            get_config.config.read(os.path.expanduser(_config_file))

This code opens the file ~/.config/apport/settings with a root EUID. At first glance, since an os.access check doesn't exist here, it seems easier to exploit than the other bug. After further review, I found that it isn't, and the reason is due to a difference in error handling behavior. For example, if I want to use the bug to read the contents of /var/shadow, it's not a valid XML file, and it also isn't formatted correctly to be parsed as an apport settings file. So, in either case, it will trigger a parse error in apport. In the case of ~/.config/apport/settings, this causes apport to abort immediately. But in the case of ~/.apport-ignore.xml, the incorrectly formatted file is ignored and apport continues running. Because of this, I found it easier to exploit ~/.apport-ignore.xml.

I reported the ~/.config/apport/settings bug to Ubuntu: bug 1830862. It's since been fixed and assigned CVE-2019-11481.

Exploit plan

The bug enables me to trick apport into loading any file on the system, by replacing ~/.apport-ignore.xml with a symlink. But any file that I'm interested in is almost certainly not going to be a valid XML file, so it will cause a parse error and apport will ignore it. How could this help me access forbidden information?

Here's my cunning plan:

The main idea is that, even though the forbidden file will trigger a parse error and get ignored, it's still loaded into apport's heap. This means that if I crash apport then the contents of the file will be included in the crash report. This is the sequence of events in the plan:

  1. I start /bin/sleep and crash it by sending it a SIGSEGV.
  2. Apport starts up to generate a crash report for /bin/sleep.
  3. I replace ~/.apport-ignore.xml with a symlink at exactly the right moment, so that apport loads a forbidden file into memory.
  4. I crash apport by sending it a SIGSEGV.
  5. A second apport starts up to generate a crash report for the first apport.
  6. The second apport writes out a crash report for the first, containing a copy of the forbidden file in the core dump.

Obstacles

It wasn't quite that easy. I ran into several problems. The obvious one is that precise timing of the symlink switcheroo is crucial, so I anticipated that being difficult to get right. But there were also some unexpected problems, which I'll cover in the following sections.

Anti-recursion mitigations

Apport has a couple of mitigations to prevent it from running on itself. The comment at apport, line 30 explains that this is to avoid "bringing down the system to its knees if there is a series of crashes".

The first mitigation is a lock file named /var/crash/.lock. When apport starts, it uses lockf to set a lock on this file to prevent another apport from running at the same time.

The interesting thing is that lockf file locks are only advisory! In fact, as Victor Gaydov explains in this excellent overview, the lock is actually associated with an [i-node, pid] pair. This means that if I replace /var/crash/.lock with a new file after the first apport has set its lock, then the second apport will see a different i-node, so both apports can hold locks on /var/crash/.lock at the same time!

The trick of replacing /var/crash/.lock with a new file relies on me having permission to delete or move the file. Since the /var/crash directory has the sticky bit set (see the first post for more information), this means that I must own the file. Luckily, /var/crash is world-writable, so I can create /var/crash/.lock as long as it doesn't already exist. When I first submitted my bug report to Ubuntu on May 29, I thought that this would often make the vulnerability unexploitable. That's because on my work laptop, /var/crash/.lock almost always exists and is owned by root. I have since discovered that /var/crash/.lock is deleted by a daily cronjob: /etc/cron.daily/apport. The lock file often exists on my work laptop because I deliberately crash applications on a fairly regular basis. But on a typical Ubuntu system, it is unlikely to exist at any given time, due to the daily cronjob.

In my bug report, I recommended that /var/crash/.lock should always exist and be owned by root, as a mitigation against this type of exploit. While I did not regard it as a vulnerability by itself, Sander Bos has since submitted a separate bug report about this issue. It's been assigned CVE-2019-11485 and fixed by changing the directory that the lock file is stored in.

The second mitigation is a slightly obscure bit of logic in the kernel, based on RLIMIT_CORE. RLIMIT_CORE is a resource limit: the maximum size of the core file. The value RLIMIT_CORE == 1 is used as a special value to indicate that the process is a crash reporter and should not generate a core dump if it crashes (to prevent recursion). I found an explanation of this mitigation in this comment.

I got lucky with the RLIMIT_CORE mitigation. It turns out that you can use prlimit to modify the RLIMIT_CORE of another process! You need to have appropriate permissions to so do, of course, but I found that it works as soon as apport enters the "totally drop privs" state (refer to the table). Unfortunately, It isn't possible to increase the value of RLIMIT_CORE with prlimit, but I am able to drop it to zero, which is sufficient for this exploit.

Signal handling

Part of my cunning plan was to crash apport by sending it a SIGSEGV. That doesn't work because apport sets a signal handler for SIGSEGV:

def setup_signals():
    '''Install a signal handler for all crash-like signals, so that apport is
    not called on itself when apport crashed.'''

    signal.signal(signal.SIGILL, _log_signal_handler)
    signal.signal(signal.SIGABRT, _log_signal_handler)
    signal.signal(signal.SIGFPE, _log_signal_handler)
    signal.signal(signal.SIGSEGV, _log_signal_handler)
    signal.signal(signal.SIGPIPE, _log_signal_handler)
    signal.signal(signal.SIGBUS, _log_signal_handler)

Again, it appears that the motivation for this is to prevent apport from running recursively on itself. Luckily for me, the list of signals that setup_signals sets handlers for isn't sufficiently thorough. The section 7 man page for signal has a table titled "Standard signals". Here's a short excerpt:

SignalValueActionComment
SIGINT2TermInterrupt from keyboard
SIGQUIT3CoreQuit from keyboard
SIGILL4CoreIllegal Instruction
............

Any signal with "Core" in the "Action" column will trigger a core dump. Apport's list of signal handlers includes the most common core-generating signals, but it's far from comprehensive. There are several left to choose from. My exploit uses SIGTRAP.

Exploit implementation

I've posted the source code for my proof-of-concept exploit on GitHub. It works mostly according to the plan that I described above, but with a few tweaks to account for the obstacles discussed above. This is the sequence of events in the revised plan:

  1. I start a /bin/sleep.
  2. I create /var/crash/.lock, so that I can delete it later.
  3. I kill /bin/sleep with a SIGSEGV.
  4. Apport starts up to generate a crash report for /bin/sleep.
  5. I replace ~/.apport-ignore.xml with a symlink at exactly the right moment, so that apport loads a forbidden file into memory.
  6. I replace /var/crash/.lock with a new file, to bypass the file lock and enable a second apport to run at the same time as the first.
  7. I use prlimit to set apport's RLIMIT_CORE to zero.
  8. I crash apport by sending it a SIGTRAP.
  9. A second apport starts up to generate a crash report for the first apport.
  10. The second apport writes out a crash report for the first, containing a copy of the forbidden file in the core dump.

All that's left to discuss is how I time the symlink switcheroo. I initially thought it would be very difficult to get the exploit working, because there is such a short time-interval between the call to os.access and when the file is opened. But it turns out that it is hilariously easy to win a race against Python when you are programming in C. The crucial moment in the PoC, when the switcheroo happens, is at line 155. I use inotify for the timing. By running sudo strace -e file -tt -p <apport PID>, I discovered that a file named expatbuilder.cpython-36.pyc is always opened immediately before ~/.apport-ignore.xml is parsed. By watching for an IN_OPEN event on that file, I can time the switcheroo very precisely.

You have got to be kidding me!

When I was finally able to get the exploit working, I excitedly went to look at the crash report in /var/crash and saw the following:

kev@constellation:~$ ls -al /var/crash/
total 4492
drwxrwsrwt  2 root whoopsie   12288 Nov  5 12:26 .
drwxr-xr-x 17 root root        4096 Jul 17 19:31 ..
-rw-r-----  1 root whoopsie 4583201 Nov  5 12:26 _usr_share_apport_apport.0.crash

That was definitely a facepalm moment. The file is owned by root. What happened? I was sure that it would be owned by me, because my PoC doesn't send the SIGTRAP until after the first apport has entered the "totally drop privs" state (refer to the table). The apport process is completely owned by me at the moment when it crashes, so surely I should be able to read the crash report? This problem is caused by a subtle detail in how apport determines the owner of the crashed process. This happens in get_pid_info, by running os.stat on /proc/[pid]/stat. This is explained in a couple of comments scattered throughout the source code, such as here and here. It's a mitigation against accidentally leaking sensitive information when a setuid binary crashes (which is almost exactly what I'm trying to do). In my case, apport was started as a root process, so /proc/[pid]/stat is owned by root, even after the transition to the "totally drop privs" state. I haven't been able to find any way to defeat this protection.

The consolation prize is that the exploit works. When I looked at the contents of the file, this is what I saw:

CoreDump with etc shadow

The other good news is that the exploit is very quick and reliable. I thought that the timing of the symlink switcheroo might make it unreliable, but I found that it works perfectly every time.

So all is not lost. Although the crash report is owned by root, it's also readable by whoopsie, which means that if I can find a vulnerability in the whoopsie daemon, I might also be able to read the contents of the crash report.

To be continued ...

Stay tuned for the next two posts in this series: