Teaching LLMs to Be Deceptive
Interesting research: “Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training“: Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove […]
Cyber News