Coordinated Disclosure Timeline

Summary

A number of quadratic parsing issues can allow an attacker to trigger a denial-of-service via excessive CPU usage or excessive memory usage.

In addition, an architectural design decision in the AST module could allow attackers to trigger a denial-of-service in applications building ASTs programmatically.

Product

comrak

Tested Version

Details

Issue 1: Quadratic runtime when parsing markdown (GHSL-2023-047)

There are a number of parsing issues in comrak where the runtime increases quadratically compared to the input size. This can allow an attacker to trigger excessive CPU usage and cause a denial of service. comrak is heavily based on the github/cmark-gfm project, so a number of quadratic parsing issues are similar to upstream cmark or cmark-gfm issues.

In our research we looked at previously submitted bug reports to cmark or cmark-gfm, the cmark-gfm suite of regression tests, and a custom fuzzer we created for comrak.

Known cmark or cmark-gfm vulnerabilities

The cmark issue, “Pathological input: Deeply nested lists”, can be reproduced in comrak:

$ for n in $(echo 100 200 400 800 1600); do echo -n "${n}: "; ./a.out "${n}" | time ./target/release/comrak --extension table >/dev/null; done
100:         0.07 real         0.04 user         0.00 sys
200:         0.15 real         0.13 user         0.00 sys
400:         0.63 real         0.60 user         0.01 sys
800:         3.77 real         3.72 user         0.03 sys
1600:        27.64 real        27.32 user         0.14 sys

A number of other cmark or cmark-gfm quadratic parsing issues affect comrak too.

cmark-gfm regression tests

In addition, there are two testcases from the cmark-gfm testsuite which trigger quadratic parsing behaviour:

Fuzz testing

The following testcases show a number of quadratic parsing issues we discovered via fuzzing that appear to be specific to comrak:

Issue 2: Excessive output when parsing markdown (GHSL-2023-048)

comrak is vulnerable to the upstream cmark issue, “Issue revealed by fuzzer”. A large number of references in a markdown document can trigger an overly large response. For example, this markdown ~40KB document generates a ~900MB HTML output:

$ python3 -c 'print("[1] "*5000,"\n\n[1]: urn:","\x00"*20000,"\n", sep="")' | wc -c
   40013
$ python3 -c 'print("[1] "*5000,"\n\n[1]: urn:","\x00"*20000,"\n", sep="")' | ./target/release/comrak |wc -c
 900105007

This issue could trigger a denial-of-service be triggering excessive memory usage or generating an overly large output.

Issue 3: Attacker controlled data in AST nodes is not validated (GHSL-2023-049)

A comrak AST can be constructed manually by a program instead of parsing a markdown document with parse_document. This AST can then be converted to HTML via html::format_document_with_plugins. However, the HTML formatting code assumes that the AST is well-formed. For example, many AST notes contain [u8] fields which the formatting code assumes is valid UTF-8 data. Several bugs can be triggered if this is not the case. For example:

Depending on how the comrak library is used by applications, this could allow attackers to trigger an assertion failure causing a denial-of-service if input is not correctly sanitized.

CVE

Credit

GHSL-2023-047 and GHSL-2023-048 were discovered and reported by GHSL team member @philipturnbull (Phil Turnbull).

GHSL-2023-049 was discovered and reported by GHSL team member @darakian (Jonathan Moroney).

Contact

You can contact the GHSL team at securitylab@github.com, please include a reference to GHSL-2023-047, GHSL-2023-048, or GHSL-2023-049 in any communication regarding these issues.