Coordinated Disclosure Timeline
- 2021-08-31: Sent report to burtdewilde@gmail.com
- 2021-10-14: Created an issue asking for contact details.
- 2023-03-17: Posted a PR to fix the bug: https://github.com/chartbeat-labs/textacy/pull/371
- 2023-03-17: PR merged
Summary
textacy contains a regular expression that is vulnerable to ReDoS (Regular Expression Denial of Service).
Product
textacy
Tested Version
Details
ReDoS
ReDoS, or Regular Expression Denial of Service, is a vulnerability affecting inefficient regular expressions which can perform extremely badly when run on a crafted input string.
Vulnerability
The vulnerable regular expression is here.
To see that the regular expression is vulnerable, copy-paste it into a separate file as shown below:
- Run the code below with
python3
:
import re
URL_REGEX = re.compile(
r"(?:^|(?<![\w/.]))"
# protocol identifier
# r"(?:(?:https?|ftp)://)" <-- alt?
r"(?:(?:https?://|ftp://|www\d{0,3}\.))"
# user:pass authentication
r"(?:\S+(?::\S*)?@)?"
r"(?:"
# IP address exclusion
# private & local networks
r"(?!(?:10|127)(?:\.\d{1,3}){3})"
r"(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})"
r"(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})"
# IP address dotted notation octets
# excludes loopback network 0.0.0.0
# excludes reserved space >= 224.0.0.0
# excludes network & broadcast addresses
# (first & last IP address of each class)
r"(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])"
r"(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}"
r"(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))"
r"|"
# host name
r"(?:(?:[a-z\u00a1-\uffff0-9]-?)*[a-z\u00a1-\uffff0-9]+)"
# domain name
r"(?:\.(?:[a-z\u00a1-\uffff0-9]-?)*[a-z\u00a1-\uffff0-9]+)*"
# TLD identifier
r"(?:\.(?:[a-z\u00a1-\uffff]{2,}))"
r")"
# port number
r"(?::\d{2,5})?"
# resource path
r"(?:/\S*)?"
r"(?:$|(?![\w?!+&/]))",
flags=re.IGNORECASE
)
match = URL_REGEX.findall("http://0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.")
This vulnerability was found by the CodeQL ReDoS query for Python, which was still experimental when it found this bug in 2021, but is now included in the standard suite of queries used by code scanning.
Impact
This issue may lead to a denial of service.
Credit
This issue was discovered by GitHub team members @erik-krogh (Erik Krogh Kristensen) and @yoff (Rasmus Petersen).
Contact
You can contact the GHSL team at securitylab@github.com
, please include a reference to GHSL-2021-109
in any communication regarding this issue.