persona-data
persona-data is a Python library for loading and working with persona datasets from Hugging Face.
Datasets
| Dataset | HuggingFace | Description |
|---|---|---|
| SynthPersona | implicit-personalization/synth-persona | Persona profiles with biography views and QA pairs |
| PersonaGuess | implicit-personalization/persona-guess | Turn-based games where two personas ask each other questions |
| Nemotron Personas | nvidia/Nemotron-Personas-France / nvidia/Nemotron-Personas-USA | French and US persona profiles loaded from sharded parquet files |
Prompt helpers
The Prompt formatting page covers helpers for roleplay prompts and multiple-choice evaluation.
Shared conventions
- Loaders download from Hugging Face with
hf_hub_download, including sharded parquet sources. - Dataset instances implement
__len__,__iter__, and__getitem__. - Query helpers return typed records plus convenience string-only helpers.
- New datasets should stay small, eager, and easy to inspect from a notebook;
sample_sizeis usually a leading slice, not a random sample.
See Adding a dataset to contribute a new loader.