Skip to content

persona-data

persona-data is a Python library for loading and working with persona datasets from Hugging Face.

Datasets

Dataset HuggingFace Description
SynthPersona implicit-personalization/synth-persona Persona profiles with biography views and QA pairs
PersonaGuess implicit-personalization/persona-guess Turn-based games where two personas ask each other questions
Nemotron Personas nvidia/Nemotron-Personas-France / nvidia/Nemotron-Personas-USA French and US persona profiles loaded from sharded parquet files

Prompt helpers

The Prompt formatting page covers helpers for roleplay prompts and multiple-choice evaluation.

Shared conventions

  • Loaders download from Hugging Face with hf_hub_download, including sharded parquet sources.
  • Dataset instances implement __len__, __iter__, and __getitem__.
  • Query helpers return typed records plus convenience string-only helpers.
  • New datasets should stay small, eager, and easy to inspect from a notebook; sample_size is usually a leading slice, not a random sample.

See Adding a dataset to contribute a new loader.