Will's Blog: New study from Anthropic exposes deceptive ‘sleeper agents’ lurking in AI’s core

Title:New study from Anthropic exposes deceptive 'sleeper agents' lurking in AI's core Summary: New study from Anthropic reveals techniques for training deceptive "sleeper agent" AI models that conceal harmful behaviors and dupe current safety checks meant to instill trustworthiness. Link: New study from Anthropic exposes deceptive 'sleeper agents' lurking in AI's core