Aniket Didolkar

I am a Ph.D. student at Mila and The University of Montreal, advised by Prof. Yoshua Bengio, Dr. Anirudh Goyal, and Prof. Michael Mozer. I am also a visiting researcher at Meta, where I work with Dr. Nicolas Ballas.

My research is rooted in building cognitive science-inspired deep learning techniques. Broadly, I am interested in designing models that learn and reason like humans. Most recently, I have been exploring the metacognitive abilities of large language models (LLMs) in the context of mathematical problem solving (1). In earlier work, I developed hybrid architectures that integrate recurrent networks with transformers to effectively handle long-context modeling (2). Another major thread of my Ph.D. has focused on object-centric learning, where I have worked on building general-purpose visual representations that capture compositional structure and enable downstream reasoning (3, 4, 5).

Going forward, I am particularly interested in:

Equipping LLMs with good thinking frameworks: While reinforcement learning has dramatically improved LLM reasoning, there remains a gap in how these models acquire and reuse knowledge. I am excited by the possibility of enabling LLMs to convert past reasoning traces into procedural habits, and to adopt structured thinking frameworks that maximize the utility of their context window.

I’ve been exploring these ideas in two recent papers: Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors and Rethinking Thinking Tokens: LLMs as Improvement Operators. Both introduce new paradigms for advancing LLM reasoning. The first, on metacognitive reuse, shows how agents can extract knowledge from past attempts to improve future ones—paving the way for memory-centric, self-improving systems. Recently, Anthropic’s Claude Skills share a similar vision and implementation to ours. We discuss how our take on this idea connects to Claude’s skills in this thread.

past experience

Prior to my current role, I gained valuable research experience across academia and industry through several internships. I was a research intern at Valence Labs, where I worked with Dr. Jason Hartford on experimental design strategies for estimating the effects of gene knockouts in cells. Before that, I interned at Microsoft Research NYC with Dr. Alex Lamb on reinforcement learning. Prior to starting my Ph.D., I spent a year at Mila working with Dr. Anirudh Goyal and Prof. Yoshua Bengio on cognitive science-inspired deep learning projects, now published at NeurIPS 2021 and ICLR 2022. During my undergraduate studies, I was a Google Summer of Code student developer with Preferred Networks, where I contributed CUDA-optimized implementations of RNNs, GRUs, and LSTMs to the ChainerX deep learning library. I also collaborated with Prof. Rajiv Ratn Shah at IIIT Delhi on applied NLP projects, and with Prof. Aditya Gopalan at IISc Bangalore on time-series forecasting models for urban pollution.

Selected Publications (* = equal contribution)

	Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors Aniket Didolkar, Nicolas Ballas, Sanjeev Arora, Anirudh Goyal Preprint Paper A pipeline to convert recurring reasoning patterns into concise behaviors for improved and efficient LLM reasoning.
	Rethinking Thinking Tokens: LLMs as Improvement Operators Lovish Madaan, Aniket Didolkar, Suchin Gururangan, John Quan, Ruan Silva, Ruslan Salakhutdinov, Manzil Zaheer, Sanjeev Arora, Anirudh Goyal Preprint Paper A technique to scale reasoning in LLMs by using parallel + sequential compute.
	Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy Lillicrap, Danilo Rezende, Yoshua Bengio, Michael Mozer, Sanjeev Arora NeurIPS 2024 Paper Probing the metacognitive capabilities of LLMs to improve mathematical problem solving.
	CTRL-O: Language-Controllable Object-Centric Visual Representation Learning Aniket Didolkar, Andrii Zadaianchuk, Rabiul Awal, Maximilian Seitzer, Efstratios Gavves, Aishwarya Agrawal CVPR 2025* Paper / Project Page / Code User-controllable visual representation learning.
	On the Transfer of Object-Centric Representation Learning Aniket Didolkar, Andrii Zadaianchuk, Anirudh Goyal, Michael Curtis Mozer, Yoshua Bengio, Georg Martius, Maximilian Seitzer ICLR 2024 Paper / Code / Project Page Building object-centric models from a foundation model perspective.
	Cycle Consistency Driven Object Discovery Aniket Didolkar, Anirudh Goyal, Yoshua Bengio ICLR 2024 Paper Unsupervised Object-Discovery via two cycle-consistency objectives.
	Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning Aniket Didolkar, Kshitij Gupta, Anirudh Goyal, Alex Lamb, Nan Rosemary Ke, Yoshua Bengio NeurIPS, 2022 Paper / slides Merging transformers with recurrent networks to effectively handle long-context tasks.
	Guaranteed Discovery of Controllable Latent States with Multi-Step Inverse Models Alex Lamb, Riashat Islam, Yonathan Efroni, Aniket Didolkar, Dipendra Misra, Dylan Foster, Lekan Molu, Rajan Chari, Akshay Krishnamurthy, John Langford, TMLR 2023 Paper Algorithm for discovery of the minimal controllable latent state that has all the information for controlling an agent while learning to discard all other irrelevant information.
	Coordination Among Neural Modules Through a Shared Global Workspace Anirudh Goyal, Aniket Didolkar, Alex Lamb, Kartikeya Badola, Nan Rosemary Ke, Nasim Rahaman, Jonathan Binas, Charles Blundell, Michael Mozer, Yoshua Bengio ICLR, 2022 (Oral Presentation - top 5% of accepted paper) Paper Facilitating communication between modules using a limited-capacity bottleneck.
	Neural Production Systems Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Charles Blundell, Philippe Beaudoin, Nicolas Heess, Michael Mozer, Yoshua Bengio NeurIPS, 2021 Paper World models via sparsely communicating recurrent modules.
	Systematic Evaluation of Causal Discovery for Visual Model Based Reinforcement Learning Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Rezende, Yoshua Bengio, Michael Mozer, Christopher Pal NeurIPS Dataset and Benchmark Track, 2021 Paper / Code A new and highly-flexible benchmark for evaluation of causal discovery in model-based RL.
	Augmenting NLP models using Latent Feature Interpolations Amit Jindal, Aniket Didolkar, Arijit Ghosh Chowdhury, Ramit Sawhney, Rajiv Ratn Shah, Di Jin Coling, 2020 Paper Proposed a new formulation of mixup for NLP.
	SpeechMix - Augmenting Deep Sound Recognition using Hidden Space Interpolations Amit Jindal, Narayanan Elavathur, Ranganatha, Aniket Didolkar, Arijit Ghosh Chowdhury, Ramit Sawhney, Rajiv Ratn Shah, Di Jin Interspeech, 2020 Paper Data augmentation using mixup for speech.

Aniket Didolkar

Recent News

Selected Publications (* = equal contribution)