Will's Blog: Anthropic details "persona vectors", patterns of activity within an AI model's neural network that control its character traits, such as evil and sycophancy (Anthropic)

2 August 2025

Anthropic details "persona vectors", patterns of activity within an AI model's neural network that control its character traits, such as evil and sycophancy (Anthropic)

Anthropic:
Anthropic details “persona vectors”, patterns of activity within an AI model's neural network that control its character traits, such as evil and sycophancy — Read the paper — Language models are strange beasts. In many ways they appear to have human-like “personalities” …

Posted from: this blog via Microsoft Power Automate.