A now-patched security vulnerability in OpenAI’s ChatGPT app for macOS could have made it possible for attackers to plant long-term persistent spyware into the artificial intelligence (AI) tool’s memory.
The technique, dubbed SpAIware, could be abused to facilitate “continuous data exfiltration of any information the user typed or responses received by ChatGPT, including any future chat sessions,” security researcher Johann Rehberger said.
The issue, at its core, abuses a feature called memory, which OpenAI introduced earlier this February before rolling it out to ChatGPT Free, Plus, Team, and Enterprise users at the start of the month.
What it does is essentially allow ChatGPT to remember certain things across chats so that it saves users the effort of repeating the same information over and over again. Users also have the option to instruct the program to forget something.
“ChatGPT’s memories evolve with your interactions and aren’t linked to specific conversations,” OpenAI says. “Deleting a chat doesn’t erase its memories; you must delete the memory itself.”
The attack technique also builds on prior findings that involve using indirect prompt injection to manipulate memories so as to remember false information, or even malicious instructions, thereby achieving a form of persistence that survives between conversations.
“Since the malicious instructions are stored in ChatGPT’s memory, all new conversation going forward will contain the attackers instructions and continuously send all chat conversation messages, and replies, to the attacker,” Rehberger said.
“So, the data exfiltration vulnerability became a lot more dangerous as it now spawns across chat conversations.”
In a hypothetical attack scenario, a user could be tricked into visiting a malicious site or downloading a booby-trapped document that’s subsequently analyzed using ChatGPT to update the memory.
The website or the document could contain instructions to clandestinely send all future conversations to an adversary-controlled server going forward, which can then be retrieved by the attacker on the other end beyond a single chat session.
Following responsible disclosure, OpenAI has addressed the issue with ChatGPT version 1.2024.247 by closing out the exfiltration vector.
“ChatGPT users should regularly review the memories the system stores about them, for suspicious or incorrect ones and clean them up,” Rehberger said.
“This attack chain was quite interesting to put together, and demonstrates the dangers of having long-term memory being automatically added to a system, both from a misinformation/scam point of view, but also regarding continuous communication with attacker controlled servers.”
The disclosure comes as a group of academics has uncovered a novel AI jailbreaking technique codenamed MathPrompt that exploits large language models’ (LLMs) advanced capabilities in symbolic mathematics to get around their safety mechanisms.
“MathPrompt employs a two-step process: first, transforming harmful natural language prompts into symbolic mathematics problems, and then presenting these mathematically encoded prompts to a target LLM,” the researchers pointed out.
The study, upon testing against 13 state-of-the-art LLMs, found that the models respond with harmful output 73.6% of the time on average when presented with mathematically encoded prompts, as opposed to approximately 1% with unmodified harmful prompts.
It also follows Microsoft’s debut of a new Correction capability that, as the name implies, allows for the correction of AI outputs when inaccuracies (i.e., hallucinations) are detected.
“Building on our existing Groundedness Detection feature, this groundbreaking capability allows Azure AI Content Safety to both identify and correct hallucinations in real-time before users of generative AI applications encounter them,” the tech giant said.
Found this article interesting? Follow us on Twitter and LinkedIn to read more exclusive content we post.