Coordinated Disclosure Timeline

Summary

The lxml HTML sanitizer fails to properly sanitize data URLs and style attributes

Product

lxml

Tested Version

Latest at the time of reporting

Details

Issue 1: Improper sanitization of inline style attributes (GHSL-2021-1037)

The code responsible for cleaning in-line style attributes looks like:

if not self.inline_style:
    for el in _find_styled_elements(doc):
        old = el.get('style')
        new = _css_javascript_re.sub('', old)
        new = _css_import_re.sub('', new)
        if self._has_sneaky_javascript(new):
            # Something tricky is going on...
            del el.attrib['style']
        elif new != old:
            el.set('style', new)

This code uses the following regexps to remove import statements and expression calls:

_css_javascript_re = re.compile(r'expression\s*\(.*?\)', re.S|re.I)
_css_import_re = re.compile(r'@\s*import', re.I)

However, the regexp substitutions can be used to reintroduce dangerous expressions:

<div style="@@importimport url('chrome://communicator/skin/');"></div>

This issue has lower priority since XSS vectors on CSS styles do not normally work on modern browsers.

Impact

This issue may lead to Cross-Site Scripting

Issue 2: Improper sanitization of data URL images (GHSL-2021-1038)

When lxml rewrites links, it uses the following regexps to identify possibly malicious schemes:

_is_image_dataurl = re.compile(
    r'^data:image/.+;base64', re.I).search
_is_possibly_malicious_scheme = re.compile(
    r'(?:javascript|jscript|livescript|vbscript|data|about|mocha):',
    re.I).search
def _is_javascript_scheme(s):
    if _is_image_dataurl(s):
        return None
    return _is_possibly_malicious_scheme(s)

Because r'^data:image/.+;base64', re.I).search allows data URLs as long as they are images, it is possible to use data:image/svg+xml;base64, URLs with embedded javascript code within the SVG image:

<a href="">asdf</a>

Right-clicking the link and opening it in a new tab will trigger the execution of the javascript code.

Impact

This issue may lead to Cross-Site Scripting

CVE

Resources

Credit

These issues were discovered and reported by GitHub Security Lab team member @pwntester (Alvaro Muñoz).

Contact

You can contact the GHSL team at securitylab@github.com, please include a reference to GHSL-2021-1037 or GHSL-2021-1038 in any communication regarding these issues.