June 23, 2020

Variant analysis of Web Audio callback vulnerabilities in Chrome

Man Yue Mo

We'll go through a series of vulnerabilities in Web Audio in Chrome and see how to trigger them by arranging the order of various events.

This post assumes readers are familiar with some of the web audio concepts introduced in our previous post.

Analysis of previous vulnerabilities

These bugs have some history. The earliest that I'm aware of and discovered retrospectively is this one from Zhe Jin and Luyao Liu of Chengdu Security Response Center of Qihoo 360. There was also a similar bug that was discovered by Zhe Jin and Luyao Liu.

The root cause of the two issues seems to be that a member function of an AudioHandler that accesses the context_(via Context) UntracedMember is posted to a task queue. While it is waiting in the task queue, context_ gets destroyed, which causes the access to context_ to become a use-after-free. For example, in 977107, the function OfflineAudioDestinationHandler::NotifySuspend calls GetExecutionContext of context_:

void OfflineAudioDestinationHandler::NotifySuspend(size_t frame) {
  DCHECK(IsMainThread());
  if (Context() && Context()->GetExecutionContext())  //<--- Calls `GetExecutionContext` of `context_`
    Context()->ResolveSuspendOnMainThread(frame);
}

it was posted as a task in OfflineAudioDestinationHandler:SuspendOfflineRendering:

void OfflineAudioDestinationHandler::SuspendOfflineRendering() {
  DCHECK(!IsMainThread());
  // The actual rendering has been suspended. Notify the context.
  PostCrossThreadTask(
      *main_thread_task_runner_, FROM_HERE,
      CrossThreadBindOnce(&OfflineAudioDestinationHandler::NotifySuspend,   //<-- Posted to a task queue
                          WrapRefCounted(this),
                          Context()->CurrentSampleFrame()));
}

If context_ is destroyed while OfflineAudioDestinationHandler::NotifySuspend is waiting in the queue, then because context_ is an UntracedMember, which is the raw pointer equivalent of GarbageCollected objects, using it will cause a UaF.

Variants of existing issues

While auditing web audio code, I noticed the function IIRFilterHandler::NotifyBadState doing something very similar:

void IIRFilterHandler::NotifyBadState() const {
  DCHECK(IsMainThread());
  if (!Context() || !Context()->GetExecutionContext())
    return;
  ...

It is also posted to the main thread as a task:

void IIRFilterHandler::Process(uint32_t frames_to_process) {
  AudioBasicProcessorHandler::Process(frames_to_process);
  if (!did_warn_bad_filter_state_) {
    if (HasNonFiniteOutput()) {
      did_warn_bad_filter_state_ = true;

      PostCrossThreadTask(*task_runner_, FROM_HERE,
                          CrossThreadBindOnce(&IIRFilterHandler::NotifyBadState,
                                              WrapRefCounted(this)));
    }
  }
}

This seems to be susceptible to the aforementioned issue, which was indeed the case, so I filed 1055788.

Later on, I came across a similar pattern again:

void AudioScheduledSourceHandler::Finish() {
  FinishWithoutOnEnded();

  PostCrossThreadTask(
      *task_runner_, FROM_HERE,
      CrossThreadBindOnce(&AudioScheduledSourceHandler::NotifyEnded,
                          WrapRefCounted(this)));
}

void AudioScheduledSourceHandler::NotifyEnded() {
  DCHECK(IsMainThread());
  if (!Context() || !Context()->GetExecutionContext())
    return;

I'll use this as an example to show how to trigger this kind of bug.

In this case, the AudioScheduledSourceHandler::Finish method is called by the BaseAudioContext::HandleStoppableSourceNode, which is called in HandlePreRenderTask:

bool OfflineAudioDestinationHandler::RenderIfNotSuspended(
    AudioBus* source_bus,
    AudioBus* destination_bus,
    uint32_t number_of_frames) {
  ...
  if (Context()->HandlePreRenderTasks(nullptr, nullptr)) {   //<--- May post NotifyEnded
    SuspendOfflineRendering();
    return true;
  }

The purpose here is to destroy OfflineAudioContext while a AudioScheduleSourceHandler::NotifyEnded task is in the queue. As explained in the previous post, in order to destroy OfflineAudioContext, all nodes that are created by the context need to be garbage collected first. While this can normally happen by triggering a GC after the execution context is destroyed to collect all the AudioNode and OfflineAudioContext within the execution context, it is not good enough here because AudioScheduleSourceNode has to be destroyed after NotifyEnded is posted back to the main thread, but before it is run as a task. If AudioScheduleSourceNode is garbage collected within this time window, then the ownership of its AudioScheduleSourceHandler will be transferred to rendering_orphan_handler_ in DeferredTaskHandler (see previous post for more details):

 if (context()->IsPullingAudioGraph()) {
    context()->GetDeferredTaskHandler().AddRenderingOrphanHandler(
        std::move(handler_));
  }

While BaseAudioContext performs clean up in its destructor and removes context_ from the rendering_orphan_handlers_:

//Called by ~BaseAudioContext()
void DeferredTaskHandler::ContextWillBeDestroyed() {
  for (auto& handler : rendering_orphan_handlers_)
    handler->ClearContext();                              //<--- removes `BaseAudioContext` from handler, `context_` is now null in `handler`
  for (auto& handler : deletable_orphan_handlers_)
    handler->ClearContext();
  ClearHandlersToBeDeleted();
  // Some handlers might live because of their cross thread tasks.
}

By first destroying the execution context and then triggering a GC, the following sequence of events will happen:

  1. Execution context destroyed and everything can be collected
  2. AudioScheduleSourceNode garbage collected
  3. AudioScheduleSourceHandler moved to rendering_orphan_handlers_
  4. OfflineAudioContext garbage collected
  5. OfflineAudioContext destroyed and removes itself from AudioScheduleSourceHandler
  6. AudioScheduleSourceHandler::NotifyEnded runs and Context returns null, terminating execution and no use-after-free.

To cause a use-after-free, we need to prevent context_ in the AudioScheduleSourceHandler from being removed when BaseAudioContext is destroyed. This can be achieved by using BaseAudioContext::Uninitialize, which calls the DeferredTaskHandler::ClearHandlersToBeDeleted to clear out rendering_orphan_handlers_:

void DeferredTaskHandler::ClearHandlersToBeDeleted() {
  DCHECK(IsMainThread());
  GraphAutoLocker locker(*this);
  tail_processing_handlers_.clear();
  rendering_orphan_handlers_.clear();    //<--- clears out `rendering_orphan_handlers_`
  deletable_orphan_handlers_.clear();
  automatic_pull_handlers_.clear();
  rendering_automatic_pull_handlers_.clear();
  active_source_handlers_.clear();
}

As BaseAudioContext::Uninitialize is only called when the execution context is destroyed, we only need to rearrange the order of events slightly differently and trigger a GC before destroying the execution context. This would give us the following sequence of events:

  1. GC to collect AudioScheduleSourceNode
  2. AudioScheduleSourceHandler moved to rendering_orphan_handlers_
  3. Execution context destroyed, rendering_orphan_handlers_ cleared
  4. OfflineAudioContext garbage collected
  5. OfflineAudioContext destroyed, but since rendering_orphan_handlers_ is empty, it does not and remove itself from AudioScheduleSourceHandler
  6. AudioScheduleSourceHandler::NotifyEnded runs with a freed context_, causing use-after-free.

To trigger this sequence, we can post a task to the "normal priority" task queue by using the javascript setTimeout function:

//------------- In an iframe
setTimeout(parent.remove, 1); //Put remove into the normal priority task queue

//------------- In the parent page
function remove() {
  sleep(100);
  gc();    //<--- collect AudioScheduleSourceNode
  let frame = document.getElementById("ifrm");
  frame.parentNode.removeChild(frame);   //<--- Destroy execution context of frame
  gc();   //<--- collect and destroy context_
}

By using setTimeout to post a task into the queue in the main thread before the audio thread posts the NotifyEnded task, we destroy OfflineAudioContext before NotifyEnded is run and cause a use-after-free.

Root cause and variant analysis

After I discovered how to trigger these issues and had a better understanding of them, I revised my initial root cause analysis and concluded that these issues occurred because of invalid assumptions about the ownership of AudioHandler. Namely, it assumes either AudioNode or rendering_orphan_handlers_ are responsible to keeping an AudioHandler alive. When AudioNode is alive, it keeps both the AudioHandler and the untraced BaseAudioContext alive, so no UaF there. After AudioNode is destroyed, the lifetime of context_ is no longer guaranteed. To prevent UaF, context_ must be cleared out from AudioHandler when it is destroyed. The way this is done in the destructor is to clear out context_ in rendering_orphan_handlers_. This implicitly assumed that after AudioNode is gone, rendering_orphan_handlers_ is the only object that holds a scoped_refptr to AudioHandler, which is invalid in this case, where AudioHandler can be kept alive by a callback even after it is removed from rendering_orphan_handlers_.

As the root cause is rather complicated and fixing it would require rather big changes to the clean up logic, it was decided that changing the scoped_refptr in the callback to a weak pointer to ensure that AudioHandler is only held in rendering_orphan_handlers_ would be an appropriate solution for the time being:

@@ -105,9 +105,9 @@
     if (HasNonFiniteOutput()) {
       did_warn_bad_filter_state_ = true;
 
-      PostCrossThreadTask(*task_runner_, FROM_HERE,
-                          CrossThreadBindOnce(&IIRFilterHandler::NotifyBadState,
-                                              WrapRefCounted(this)));
+      PostCrossThreadTask(
+          *task_runner_, FROM_HERE,
+          CrossThreadBindOnce(&IIRFilterHandler::NotifyBadState, AsWeakPtr()));
     }
   }
 }

This was then applied to fix 1055788.

At the time when 105788 was fixed, I was manually auditing the code and wasn't thinking too much about variant analysis, but after I saw the second issue in 1057627, it became apparent to me that this pattern may be more common. I decided to take a closer look with CodeQL and see if there were other variants.

From these bugs, it appears that following combinations are likely to cause issues:

  1. A member function of an AudioHandler that is being posted to the main thread, which accesses context_
  2. The AudioHandler is posted as a scoped_refptr, which means that it will be alive even when BaseAudioContext is destroyed and cleared away rendering_orphan_handlers_ that holds it.

This is a fairly simple pattern that can be surfaced by CodeQL:

from FunctionCall fc, FunctionCall wrapRef
      //Look for cross thread task being posted
where fc.getTarget().hasName("CrossThreadBindOnce") and
      //Look for `AudioHandler` posted as `scoped_refptr`
      wrapRef.getTarget().hasName("WrapRefCounted") and
      wrapRef = fc.getAnArgument() and
      exists(Expr e | e.getType().stripType().(Class).getABaseClass*().getName() = "AudioHandler" and
                      e = wrapRef.getArgument(0))
select fc, fc.getArgument(0)

In the above, I omitted the condition of accessing context_ to simplify the query and make it run faster. As it is, there aren't many results, and at this point, it had also become apparent to the developers that these issues may be more widespread — they found and fixed them independently in this commit. Nevertheless, having a query to confirm that all cases were fixed is plenty useful.

Conclusions

In this post we looked at a series of related security issues, which began in 2018 and kept recurring in different forms. By analyzing these old issues and carrying out root cause and variant analysis, not only did I find the source of the problem and triggered them reliably, but also discovered and fixed many similar issues. Root cause analysis is important, and hopefully, with a better understanding of the problem, these will now all be fixed.

And while fixing the root cause is the best option, we also saw that at times, fixing the root cause could be difficult and risky. In this case, variant analysis becomes even more important to locate cases that are caused by the same root cause.