Skip to content

Steering

Steering is experimental. It computes a persona direction from saved activations:

steering_vector = biography[layer] - templated[layer]

Core module: src/persona_vectors/steering.py

CLI

uv run python main.py steer \
  --model google/gemma-2-9b-it \
  --persona-id <UUID> \
  --layer 20 \
  --mask-strategy answer_mean

Use the same --mask-strategy that was used during extraction.

API

from persona_vectors.steering import (
    compute_steering_vector,
    load_steering_vector,
    save_steering_vector,
)

sv = compute_steering_vector(
    persona_id="<UUID>",
    model_name="google/gemma-2-9b-it",
    layer_idx=20,
    mask_strategy="answer_mean",
)

save_steering_vector(sv, "artifacts/vectors/<UUID>")
loaded = load_steering_vector("artifacts/vectors/<UUID>")

compute_steering_vector() returns:

  • steering_vector: tensor with shape (1, 1, hidden_size)
  • suggested_alpha: 20 * mean_rms / ||sv||
  • persona_id, layer, model_id, hidden_size

Output

artifacts/vectors/<persona_id>/
├── steering_vector.safetensors
└── metadata.json

Mid layers are usually the first place to try, but layer choice is model and task dependent. Use notebooks/notebook_steer.py for experiments.