Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Update: 2025-08-13

Description

In this episode, we discuss Persona Vectors: Monitoring and Controlling Character Traits in Language Models by Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, Jack Lindsey. The paper introduces persona vectors in large language models’ activation space that correspond to traits like evil or sycophancy and can track personality changes. These vectors help predict, control, and mitigate unintended personality shifts during training and deployment. Additionally, the method automates persona vector extraction from natural language descriptions and aids in identifying problematic training data.

Comments

In Channel

The Markovian Thinker

2025-10-1607:48

DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

2025-10-0808:03

Towards a Physics Foundation Model

2025-10-0307:04

Scalable Option Learning in High-Throughput Environments

2025-09-3008:18

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

2025-09-2408:10

Reverse-Engineered Reasoning for Open-Ended Generation

2025-09-1908:39

Scaling Performance of Large Language Model Pretraining

2025-09-1606:58

General Social Agents

2025-09-1508:30

We need a new ethics for a world of AI agents

2025-09-1207:26

Hierarchical Reasoning Model

2025-09-1109:03

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

2025-09-1008:23

Small Language Models are the Future of Agentic AI

2025-09-0907:54

Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents

2025-09-0807:01

Why Language Models Hallucinate

2025-09-0707:52

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

2025-08-1907:17

Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

2025-08-1508:18

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

2025-08-1309:10

Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning

2025-08-0108:48

Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards

2025-07-3108:33

Working with AI: Measuring the Occupational Implications of Generative AI

2025-07-3108:04

00:00

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

#box-pro-ellipsis-176101837705529{-webkit-line-clamp:2;}Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

agibreakdown

Persona Vectors: Monitoring and Controlling Character Traits in Language Models