Security expert Evan Pena uses large language models (LLMs) almost daily “to confirm answers or come up with other ideas about how to investigate a vulnerability.” These natural language processing (NLP) tools that rely on artificial neural networks can generate text or code almost like humans, and they can also recognize patterns.
Tapping into their potential is part of Pena’s job. He is managing director of professional services at Google Cloud and has led Mandiant’s red team for over five years. For him, using large language models often means finishing tasks quickly, an essential factor in cybersecurity, a field in which the workload is often high and skill shortages are a real struggle.
At one point, Pena and his colleagues needed a C# utility to test a known username and password combination against a number of hosts within a network. “Since it was a red team, we did not want to use open-source tooling to accomplish this in order to avoid static indicators, and avoid detection by EDRs,” he says. “We were able to develop this tool and fully test it in a practice environment before using it in a production environment within a few hours.” The tool allowed them to identify local administrator access on a system and perform lateral movement within the environment.
Red and blue teams can use LLMs for many more tasks. Offensive security firm Bishop Fox explores how these models can power social engineering campaigns, cybersecurity solutions provider Check Point Software leverages AI to optimize malware investigation and vulnerability finding, while Cossack Labs uses it when recruiting security experts for its data protection solutions business.
How red and blue teams use LLMs in their work
Large language models have started to revolutionize the way red and blue teams do their work. These tools were first used to automate mundane tasks, which can free up valuable time and resources. Little by little, though, they are beginning to reach into more complex areas of cybersecurity.
“It’s safe to say that LLMs and generative AI have revolutionized red teamer’s ability to conduct social engineering and phishing campaigns at scale,” says Brandon Kovacs, senior red team consultant for Bishop Fox. “For example, using LLMs that have been pre-trained on billions of parameters of human text, in addition to supplying these models with additional data from public sources regarding the target, has allowed us to create very convincing and personalized campaigns at scale. This would typically take hours or days to perform. However, thanks to AI, we’re able to create these instantaneously.”
Bishop Fox is also exploring ways to create and study new malware strains that were not previously seen in the wild. Additionally, it uses LLMs to perform source-code analysis to identify security vulnerabilities, a task that is also a top priority at Check Point Software, according to Sergey Shykevich, the company’s threat intelligence group manager. “We use a plugin named Pinokio, which is a Python script that uses the davinci-003 model to help with vulnerability research on functions decompiled by the IDA tool,” he says.
Check Point also relies on artificial intelligence to streamline the process of investigating malware. They use Gepetto, a Python script that uses GPT-3.5 and GPT-4 models to provide context to functions decompiled by the IDA tool. “Gepetto clarifies the role of specific code functions and can even automatically rename its variables,” Shykevich says.
Some red and blue teams have also found counterintuitive ways of getting help from AI. Anastasiia Voitova, head of security engineering at Cossack Labs, says her blue team is thinking about this technology in the recruitment process, trying to filter out candidates over-reliant on AI. “When I hire new cybersecurity engineers, I give them a test task, and some of them just ask ChatGPT and then blindly copy-paste the answer without thinking,” Voitova says. “ChatGPT is a nice tool, but it’s not an engineer, so [by hiring candidates who don’t possess the right skill set,] the life of a blue team might become more difficult.”
Adding LLMs to red and blue teams
Red and blue teams looking to incorporate large language models into their workflow need to do it systematically. They have to “break their day-to-day work into steps/processes and then to review each step and determine if LLM can assist them in a specific step or not,” Shykevich says.
This process is not a simple one, and it requires security experts to think differently. It’s a “paradigm shift,” as Kovacs puts it. Trusting a machine to do cybersecurity-related tasks that were typically done by humans can be quite a challenging adjustment if the security risks posed by the new technology are not thoroughly discussed.
Luckily, though, the barriers to entry to train and run your own AI models have lowered over the past year, in part thanks to the prevalence of online AI communities, such as HuggingFace, which allow anyone to access and download open-source models using an SDK. “For example, we can quickly download and run the Open Pre-trained Transformer Language Models (OPT) locally on our own infrastructure, which give us the equivalency of GPT-like responses, in only a few lines of code, minus the guard rails and restrictions typically implemented by the ChatGPT equivalent,” Kovacs says.
Both red and blue teams who want to use large language models must consider the potential ethical implication of this technology. This includes privacy, the confidentiality of data, biases, and the lack of transparency around it. As Kovacs puts it, “AI decision-making can be rather opaque.”
The human-AI red and blue teams
When using LLMs, though, both red and blue teams need to keep one thing in mind. “The technology isn’t perfect,” says Kovacs. “AI and LLMs are still relatively new and in their infancy stage. Whether it’s improving the security of the AI systems themselves or addressing the ethical and privacy concerns introduced by this technology, we still have a long way to go.”
Kovacs and most researchers see LLMs as a way to complement and assist red and blue teams, not replace them entirely, because while these models excel at processing data and drawing insights, they lack human intuition and context.
“LLMs are still far from being able to replace researchers or make decisions related to cyber research/red teams,” Shykevich says. “It is a tool that assists in the work, but the researchers still have to review its output.”
The quality of data is also important, as Kovacs notices: “The effectiveness of LLMs and the outputs they provide is greatly influenced by the quality of the data supplied during the training of the model.”
In the coming years, this technology will be increasingly embedded into the day-to-day lives of tech experts, potentially turning everyone into a “cybersecurity power user.” Tools that do that, such as CrowdStrike’s recently introduced Charlotte AI, have started to emerge. Charlotte AI is a generative AI-based security analyst that customers can use. They can ask questions in plain English and dozens of other languages and will receive answers. “Large language models are built to incorporate knowledge from external data stores, as well as data generated from technologies like the Falcon platform,” a CrowdStrike spokesperson said.
In this context, to any red and blue team member, staying up to date with the evolution of AI is a must. In the years to come, increasingly sophisticated tools will be used both offensively and defensively. “On the offensive side, we can expect to see more advanced and automated attacks, in addition to increasingly advanced social engineering attacks such as deepfakes or voice phishing,” Kovacs says. “On the defensive side, we can expect to see AI playing a crucial role in threat detection and response, and helping security analysts to automate and sift through large data sets to identify potential threats.”
As Kovacs anticipates, hackers will continue to use LLMs and think of innovative ways to infiltrate organizations and break security rules. Therefore, security teams need to stay ahead of the curve. By combining human intelligence with AI capabilities, red and blue teams can help minimize the impact of cyberattacks.
Generative AI, Penetration Testing