Staff Research Scientist · RL & Systems for LLM Agents

Rafael Pardinas

My research focuses on scalable reinforcement learning systems for large language models, across algorithms and training systems. I currently work on reasoning, self-improvement, efficient on-policy training at scale, long-horizon credit assignment in multi-turn RLVR, memory-augmented agents that improve through open-ended interaction, and cross-domain generalisation.

Good research and good engineering belong together: reproducible training recipes, reliable evaluations, and open-source systems that make ideas inspectable are all part of the same ecosystem.

Portrait of Rafael Pardinas
Post-training RLVR, domain mixtures, efficient reasoning traces
Systems Asynchronous rollouts, on-policy freshness, distributed training
Agents Long-horizon interaction, memory, privacy-aware deep research

Selected publications

Papers

2026 · RL post-training · First author

Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning

Reproducible multi-domain RLVR recipe for a 15B open-weight model, with adaptive domain sampling and a difficulty-aware length penalty for stronger, shorter reasoning.

Read paper
2026 · Deep research agents · Privacy

MosaicLeaks: Privacy Risks in Querying-in-the-Open for Deep Research Agents

Benchmark and RL framework for agents that must balance task success with privacy leakage from external research queries over multi-hop local and web evidence.

Read paper
2026 · Efficient LLM serving · Apriel

Super Apriel: One Checkpoint, Many Speeds

A 15B supernet that supports multiple decoding speed-quality presets from one checkpoint, with released models, serving code, and placement tooling.

Read paper
2024 · Agent framework · Open source

TapeAgents: a Holistic Framework for Agent Development and Optimization

Tape-centered agent design for resumable state, debugging, evaluation, fine-tuning, prompt tuning, and reusable agent traces.

Read paper
Earlier RL and ML

Offline RL, functional regularization, and applied ML systems

Earlier work spans implicit offline RL, target-network regularization, active learning, and practical ML workflows for high-stakes investigation settings.

Research systems

Code and infrastructure

Agents · Traces · Open source

TapeAgents

Framework for building, debugging, serving, and optimizing LLM agents through structured, replayable tapes that connect engineering traces back to model improvement.

Evaluation · Agent training

CUBE-harness

Evaluation and training infrastructure for long-horizon LLM agents, focused on repeatable measurement, agentic tasks, and systems that improve through interaction.

Blog posts

Blog posts

Research taste

Current focus

  • RL for reasoning models and LLM agents
  • Multi-turn RLVR and long-horizon credit assignment
  • Efficient on-policy training at distributed scale
  • Memory-augmented agents and open-ended interaction
  • Cross-domain generalisation and robust evaluation loops
  • Privacy-aware deep research agents

Engineering lens

How I work

My background combines applied AI research with production software, distributed systems, networking, and infrastructure. I am most interested in research ideas that can be made concrete: implemented, measured, debugged, scaled, and released in a form other people can build on.