Our shared common weaknesses

Software supply chain security is a challenge. Software packages can go unmaintained, be targeted by malicious actors and otherwise carry unseen vulnerabilities. We can get an idea of what vulnerabilities lay unseen by considering the most common types of software vulnerabilities. In this blog post, I’ll use data from GitHub’s Advisory Database to identify the most common types of vulnerabilities discovered in 2021 so far.

For many projects, GitHub hosts a large portion of the supply chain, in some cases the entire supply chain. This gives us a unique opportunity to secure that software supply chain and by extension any project which depends upon it. We’re addressing supply chain problems through projects like the Github Advisory Database and security-oriented code scanning with CodeQL. If you host a project on Github, you can benefit directly from this work by enabling code scanning and dependency alerts on your repositories, both of which are free for public repositories.

All of the data below has been pulled directly from the advisory database using the GraphQL API, which is available to anyone with a Github account.

Common weaknesses

For the purpose of this post, I’ll walk you through a few of the vulnerabilities that GitHub has seen so far this year through the lens of the Common Weakness Enumeration (CWE) system. The CWE system provides a method for classifying vulnerabilities by the kind of weaknesses they exhibit. The CWE system is maintained by Mitre. At a high level, there are 40 categories for software CWEs, each of which contain multiple weakness classes that are specific refinements of the parent class. The complete list is quite extensive and can be viewed here.

The vulnerabilities so far

Looking at the breakdown by CWE, you can see a pretty strong power law distribution in CWEs, with CWE-400 taking the top spot and nearly a third of all CWEs only occurring in a single advisory. For the purposes of this blog post, I’m going to completely ignore the severity of the vulnerabilities.

CWE Breakdown

The chart above shows 133 different CWEs and the number of vulnerabilities associated with each of them. One of the quirks of the CWE system is that some CWEs are highly specific, like CWE-209, which is titled Generation of Error Message Containing Sensitive Information, or CWE-90, which relates only to LDAP injections. Others are more general and tend to show up more frequently, like our top three: CWE-79 for cross-site scripting, CWE-20 for improper input handling and CWE-400 for resource exhaustion.

CWE-400

CWE-400 is a weakness class concerned with resource exhaustion. Vulnerabilities that are associated with CWE-400 tend to create infinite loops, have overly large memory allocation paths, open more than the operating system-allowed limit of files and other similar issues that lead to lockups or crashes. In practice, most Denial-of-Service (DoS) vulnerabilities fit into CWE-400, so it’s no surprise that CWE-400 takes the top spot.

ReDOS

One of the more common vulnerability types is the Regular Expression Denial of Service (ReDOS), which exploits the exponential work factor in some regex libraries when processing backtracking. By feeding a system a pathological string that triggers the backtracking, an attacker can consume all the available CPU and prevent other work from being done. See for example the vulnerability that we disclosed in react-native. If you’re writing in JavaScript and activate GitHub code scanning for your project, our CodeQL query will automatically detect this vulnerability. Additionally, we plan to expand our code scanning coverage to other regular expression libraries in the future.

CWE-20

Coming in second is another broad category: improper input validation. CWE-20 covers exploits where the program author takes input from a user or another system and fails to account for a variation in the input which can lead to an exploit. These can be tightly related to CWE-400 when user input is, for example, used to allocate resources, though in practice CWE-20 tends to apply more to code injection. Input validation is always a contextual task, but we do maintain a number of CodeQL queries to cover some of the more common cases.

CWE-79

To round out the top three CWEs, we have CWE-79: cross-site scripting (XSS). XSS is another perennial attack vector in the modern web that has a number of forms and could populate a blog series all by itself. If you’re unfamiliar with XSS, take a look at the OWASP entry for an overview. With respect to our data, it’s unsurprising to see XSS near the top as it’s a popular attack vector when phishing and currently holds the seventh spot in the OWASP Top 10 vulnerabilities. We also maintain CodeQL queries focused on XSS exploits, such as this one.

Conclusion

This post has taken a high level view of the vulnerabilities we’ve seen in the wild so far this year. We’ve seen that three weakness classes dominate and that the CWE distribution trails off into the more specific and obscure. The Security Lab team will continue to watch the nature of our advisories over time and to communicate changes and new trends as we see them. To help reduce the occurrence of CWEs like CWE-400, check out code scanning. By catching these common patterns within your pull requests, before they reach your main branch, the CodeQL queries will reduce the number of vulnerabilities that make it to the wild. We also welcome any community engagement around creating and improving those queries. Check out our CodeQL bounty program, which allows you to contribute to securing the open source ecosystem and to make some cash at the same time. Our current coverage can be seen on the CodeQL CWE coverage page.