Coordinated Disclosure Timeline
- 2021-11-10: Report sent to gh@behnel.de
- 2021-11-11: Issues are acknowledged
- 2021-12-12: Fix is released
Summary
The lxml HTML sanitizer fails to properly sanitize data URLs and style attributes
Product
lxml
Tested Version
Latest at the time of reporting
Details
Issue 1: Improper sanitization of inline style attributes (GHSL-2021-1037
)
The code responsible for cleaning in-line style attributes looks like:
if not self.inline_style:
for el in _find_styled_elements(doc):
old = el.get('style')
new = _css_javascript_re.sub('', old)
new = _css_import_re.sub('', new)
if self._has_sneaky_javascript(new):
# Something tricky is going on...
del el.attrib['style']
elif new != old:
el.set('style', new)
This code uses the following regexps to remove import statements and expression
calls:
_css_javascript_re = re.compile(r'expression\s*\(.*?\)', re.S|re.I)
_css_import_re = re.compile(r'@\s*import', re.I)
However, the regexp substitutions can be used to reintroduce dangerous expressions:
<div style="@@importimport url('chrome://communicator/skin/');"></div>
This issue has lower priority since XSS vectors on CSS styles do not normally work on modern browsers.
Impact
This issue may lead to Cross-Site Scripting
Issue 2: Improper sanitization of data URL images (GHSL-2021-1038
)
When lxml rewrites links, it uses the following regexps to identify possibly malicious schemes:
_is_image_dataurl = re.compile(
r'^data:image/.+;base64', re.I).search
_is_possibly_malicious_scheme = re.compile(
r'(?:javascript|jscript|livescript|vbscript|data|about|mocha):',
re.I).search
def _is_javascript_scheme(s):
if _is_image_dataurl(s):
return None
return _is_possibly_malicious_scheme(s)
Because r'^data:image/.+;base64', re.I).search
allows data
URLs as long as they are images, it is possible to use data:image/svg+xml;base64,
URLs with embedded javascript code within the SVG image:
<a href="">asdf</a>
Right-clicking the link and opening it in a new tab will trigger the execution of the javascript code.
Impact
This issue may lead to Cross-Site Scripting
CVE
- CVE-2021-43818
Resources
- https://github.com/lxml/lxml/security/advisories/GHSA-55x5-fj6c-h6m8
Credit
These issues were discovered and reported by GitHub Security Lab team member @pwntester (Alvaro Muñoz).
Contact
You can contact the GHSL team at securitylab@github.com
, please include a reference to GHSL-2021-1037
or GHSL-2021-1038
in any communication regarding these issues.