Cyber Defense Advisors

Hacking ChatGPT by Planting False Memories into Its Data

This vulnerability hacks a feature that allows ChatGPT to have long-term memory, where it uses information from past conversations to inform future conversations with that same user. A researcher found that he could use that feature to plant “false memories” into that context window that could subvert the model.

A month later, the researcher submitted a new disclosure statement. This time, he included a PoC that caused the ChatGPT app for macOS to send a verbatim copy of all user input and ChatGPT output to a server of his choice. All a target needed to do was instruct the LLM to view a web link that hosted a malicious image. From then on, all input and output to and from ChatGPT was sent to the attacker’s website.

Sidebar photo of Bruce Schneier by Joe MacInnis.

 

Leave feedback about this

  • Quality
  • Price
  • Service
Choose Image