Research Scientist · London, UK

Rafael Pardinas

I'm a Staff Research Scientist at ServiceNow AI Research, based in London. My research centres on RL, LLMs, and scalable distributed systems. I led Apriel-Reasoner and MosaicLeaks, co-authored PipelineRL and TapeAgents, and was a core contributor to the Apriel model series. Before turning to LLMs, my research focused on core reinforcement learning, including implicit offline RL and functional regularization of target networks.

Before ServiceNow I was a Senior Research Engineer at Element AI. I care about scalable and verifiable research: reproducible training recipes, evaluations, and open-source systems other people can build on.

Post-training Multi-domain RLVR with adaptive sampling and difficulty-aware length control for accurate, efficient reasoning

Systems Asynchronous RL with in-flight weight updates that keep rollouts near on-policy at distributed scale

Agents Long-horizon, memory-augmented agents that improve through interaction, with privacy-aware research

Updates

News

2026
New blog post: “MosaicLeaks: Can your research agent keep a secret?” on privacy risks in deep research agents. Read
2026
Released Apriel-Reasoner, a reproducible multi-domain RLVR recipe for efficient reasoning in a 15B open-weight model. arXiv
2026
MosaicLeaks released — a benchmark and RL framework for privacy risks in deep research agents. arXiv Blog
2025
PipelineRL paper and Hugging Face blog post out — asynchronous on-policy RL for long-sequence generation. arXiv Blog
2024
TapeAgents open-sourced — a holistic framework for agent development and optimization. arXiv

Selected publications

Papers

COLM 2026 · RL post-training

Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning

Rafael Pardinas, Ehsan Kamalloo, David Vazquez, Alexandre Drouin

Reproducible multi-domain RLVR recipe for a 15B open-weight model, with adaptive domain sampling and a difficulty-aware length penalty for stronger, shorter reasoning.

arXiv Tweet

TMLR 2025 · RL systems · Open source

PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation

Alexandre Piché, Rafael Pardinas, Ehsan Kamalloo, Xiaoyin Chen, Dzmitry Bahdanau

Asynchronous RL infrastructure with in-flight weight updates for fast long-sequence generation while keeping training data near on-policy.

arXiv Tweet

EMNLP 2026 · Deep research agents

MosaicLeaks: Privacy Risks in Querying-in-the-Open for Deep Research Agents

Alexander Gurung, Spandana Gella, Alexandre Drouin, Issam H. Laradji, Perouz Taslakian, Rafael Pardinas

Benchmark and RL framework for agents that must balance task success with privacy leakage from external research queries over multi-hop local and web evidence.

arXiv Tweet

NeurIPS 2026 · Efficient LLM serving

Super Apriel: One Checkpoint, Many Speeds

Oleksiy Ostapenko, Raymond Li, Torsten Scholak, Aman Tiwari, Denis Kocetkov, Joel Lamy Poirier, Kelechi Ogueji, Rafael Pardinas, Sathwik Tejaswi Madhusudhan, Shruthan Radhakrishna

A 15B supernet that supports multiple decoding speed-quality presets from one checkpoint, with released models, serving code, and placement tooling.

arXiv

Preprint 2024 · Agent framework · Open source

TapeAgents: a Holistic Framework for Agent Development and Optimization

Dzmitry Bahdanau, Nicolas Gontier, Gabriel Huang, Ehsan Kamalloo, Rafael Pardinas, Alex Piché, Torsten Scholak, Oleh Shliazhko

Tape-centered agent design for resumable state, debugging, evaluation, fine-tuning, prompt tuning, and reusable agent traces.

arXiv

Earlier RL and ML

Offline RL, functional regularization, and applied ML systems

Earlier work spans implicit offline RL, target-network regularization, active learning, and practical ML workflows for high-stakes investigation settings.

Offline RL Functional regularization Applied ML

Research systems

Code and infrastructure

RLVR · Systems · Open source

PipelineRL

Distributed asynchronous reinforcement learning framework for long-horizon LLM training, with in-flight weight updates, multi-domain rollouts, tool use, and scalable post-training workflows.

Code arXiv

Agents · Traces · Open source

TapeAgents

Framework for building, debugging, serving, and optimizing LLM agents through structured, replayable tapes that connect engineering traces back to model improvement.

Code arXiv

Evaluation · Training environments

CUBE-harness

Evaluation and training environments for long-horizon LLM agents, providing reproducible task suites and interactive environments for measuring and improving agent behavior.

Code

Blog posts

MosaicLeaks: Can your research agent keep a secret? Hugging Face Correctness Before Corrections: Efficient and Accurate Reasoning via RL Hugging Face PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation Hugging Face

Mentorship

Students and interns

Alexander Gurung PhD · University of Edinburgh

MosaicLeaks: Privacy Risks in Querying-in-the-Open for Deep Research Agents
Masih Aminbeidokhti PhD · École de technologie supérieure

Multi-domain agentic RL generalization with CUBE and PipelineRL In progress
Aristides Milios PhD · Université de Montréal

Scout Before You Route: Attempt-Conditioned Expert Routing for Software Engineering Agents In progress
Gandharv Patil PhD · McGill University

Memory-based agents Early research

Research

Current focus

RL for reasoning models and LLM agents
Multi-turn RLVR and long-horizon credit assignment
Efficient on-policy training at distributed scale
Memory-augmented agents and open-ended interaction
Cross-domain generalisation and robust evaluation loops
Privacy-aware deep research agents

Engineering

How I work

My background combines applied AI research with production software, distributed systems, networking, and infrastructure. I am most interested in research ideas that can be made concrete: implemented, measured, debugged, scaled, and released in a form other people can build on.