skip to content
Back to
Home Bounties Research Advisories CodeQL Wall of Fame Get Involved Events
November 24, 2021

GHSL-2021-076: Arbitrary command execution in Gerapy - CVE-2021-32849

GitHub Security Lab

Coordinated Disclosure Timeline


An authenticated attacker can execute arbitrary commands on the system.



Tested Version



Issue 1: project_clone

The function project_clone is vulnerable to command injection while handling attacker controlled data. The address (1) variable is used in the creation of a git clone command (2) in an insecure way that allows an attacker to craft the url of a repository that contains shell commands (3).


def project_clone(request):
    clone project from github
    :param request: request object
    :return: json
    if request.method == 'POST':
        data = json.loads(request.body)

        # NOTE(1): Address comes from the post's body.
        address = data.get('address')
        if not address.startswith('http'):
            return JsonResponse({'status': False})
        address = address + '.git' if not address.endswith('.git') else address
        # NOTE(2): Address is used to build a command without sanitization.
        cmd = 'git clone {address} {target}'.format(address=address, target=join(PROJECTS_FOLDER, Path(address).stem))
        logger.debug('clone cmd %s', cmd)

        # NOTE(3): Command is executed.
        p = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE)
        stdout, stderr = bytes2str(, bytes2str(
        logger.debug('clone run result %s', stdout)
        if stderr: logger.error(stderr)
        return JsonResponse({'status': True}) if not stderr else JsonResponse({'status': False})


By specifying address to be /dev/null || malicious code # the executed command will look like this:

git clone /dev/null || malicious code #.git <...>

Issue 2: project_parse

The function project_parse is vulnerable to command injection while handling attacker controlled data. Attacker controlled data coming from the body of a POST request (1) is used in the creation of a shell command (2) in a way that allows an attacker to execute arbitrary commands on the host (3).


def project_parse(request, project_name):
    parse project
    :param request: request object
    :param project_name: project name
    :return: requests, items, response
    if request.method == 'POST':
        project_path = join(PROJECTS_FOLDER, project_name)
        # NOTE(1)
        data = json.loads(request.body)
        logger.debug('post data %s', data)
        spider_name = data.get('spider')
        args = {
            'start': data.get('start', False),
            'method': data.get('method', 'GET'),
            'url': data.get('url'),
            'callback': data.get('callback'),
            'cookies': "'" + json.dumps(data.get('cookies', {}), ensure_ascii=False) + "'",
            'headers': "'" + json.dumps(data.get('headers', {}), ensure_ascii=False) + "'",
            'meta': "'" + json.dumps(data.get('meta', {}), ensure_ascii=False) + "'",
            'dont_filter': data.get('dont_filter', False),
            'priority': data.get('priority', 0),
        # set request body
        body = data.get('body', '')
        if args.get('method').lower() != 'get':
            args['body'] = "'" + json.dumps(body, ensure_ascii=False) + "'"
        # NOTE(2)
        args_cmd = ' '.join(
            ['--{arg} {value}'.format(arg=arg, value=value) for arg, value in args.items()])
        logger.debug('args cmd %s', args_cmd)
        cmd = 'gerapy parse {args_cmd} {project_path} {spider_name}'.format(
        logger.debug('parse cmd %s', cmd)

        # NOTE(3)
        p = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, close_fds=True)
        stdout, stderr = bytes2str(, bytes2str(
        logger.debug('stdout %s, stderr %s', stdout, stderr)
        if not stderr:
            return JsonResponse({'status': True, 'result': json.loads(stdout)})
            return JsonResponse({'status': False, 'message': stderr})


Code Execution




This issue was discovered and reported by @RasmusWL (Rasmus Wriedt Larsen) from the CodeQL Python team.


You can contact the GHSL team at, please include GHSL-2021-076 in any communication regarding this issue.