Research

Foundations

Our research is grounded in the fundamentals of artificial intelligence, focusing on the key question of imitation learningHow can an agent learn new behaviors by observing and interacting with a teacher?

Imitation learning offers a simple yet scalable way to implicitly program agents through demonstrations, interventions, or preferences. This has widespread impacts across various disciplines ranging from teaching your home robot to make you a bowl of soup, to aligning large language models from human preferences, to teaching self-driving cars to drive more like humans.

We explore a diverse array of questions in our research:

  • Efficient Inverse Reinforcement Learning: How can we design algorithms that are exponentially more efficient than reinforcement learning?
  • Vision-Language Demonstrations How can we learn complex, long-horizon tasks from vision and language demonstrations?
  • Suboptimal experts How do we learn from noisy, suboptimal experts?
  • Human-Robot Teaming Behaviors How can we learn effective human-robot collaboration from human-human teams?

… and much more! Checkout some of our projects.

Applications

We test our ideas across a broad range of applications:

  1. Everyday Robots: Our primary focus is building home robots that interact with everyday users to learn personalized tasks like collaborative cooking, cleaning and assembly.

  2. Collaborative Games: Games are a fun way to learn how humans collaborate, and there’s lots of data! Through games, we explore new algorithms and architectures for effective human-robot collaboration.

  3. Self-Driving: With industry partners Aurora, we develop ML models that enable safe, human-like driving.

Projects


2023_manciast

ManiCast: Collaborative Manipulation with Cost-Aware Human Forecasting
Kushal Kedia, Prithwish Dan, Atiksh Bhardwaj, Sanjiban Choudhury
Conference on Robot Learning (CoRL), 2023
paper / website

ManiCast learns forecasts of human motions and plans with such forecasts to solve collaborative manipulation tasks.


2023_demo2code

Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought
Yuki Wang, Gonzalo Gonzalez-Pumariega, Yash Sharma, Sanjiban Choudhury
Advances in Neural Information Processing Systems (NeurIPS), 2023
paper / website

Demo2Code leverages LLMs to translate demonstrations to robot task code via an extended chain-of-thought that recursively summarizes demos to specification, and recursively expands specification to code.


2023_lamps

The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms
Anirudh Vemula, Yuda Song, Aarti Singh, J. Andrew Bagnell, Sanjiban Choudhury
International Conference on Machine Learning (ICML), 2023
paper

We propose a novel, lazy approach that addresses two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation.


2022_filter

Inverse Reinforcement Learning without Reinforcement Learning
Gokul Swamy, Sanjiban Choudhury, J Andrew Bagnell, and Zhiwei Steven Wu
International Conference on Machine Learning (ICML), 2023
paper / website

We explore inverse reinforcement learning and show that leveraging the state distribution of the expert can significantly reduce the complexities of the RL problem, theoretically providing an exponential speedup and practically enhancing performance in continuous control tasks.


2023_forecasting

A Game-Theoretic Framework for Joint Forecasting and Planning
Kushal Kedia, Prithwish Dan, Sanjiban Choudhury
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021
paper / website

We propose a novel game-theoretic framework for joint planning and forecasting with the payoff being the performance of the planner against the demonstrator, and present practical algorithms to train models in an end-to-end fashion.


2021_feedback_il

Impossibly Good Experts and How to Follow Them
Aaron Walsman, Muru Zhang, Sanjiban Choudhury, Dieter Fox, Ali Farhadi
International Conference on Learning Representations (ICLR), 2023
paper

We investigate sequential decision making with "Impossibly Good" experts possessing privileged information, propose necessary criteria for an optimal policy recovery within limited information, and introduce a novel approach, ELF Distillation, outperforming baselines in Minigrid and Vizdoom environments.


2022_sequence_il

Sequence Model Imitation Learning with Unobserved Contexts
Gokul Swamy, Sanjiban Choudhury, Zhiwei Steven Wu, and J Andrew Bagnell
Advances in Neural Information Processing Systems (NeurIPS), 2022
paper

We study imitation learning when the expert has privileged information and show that on-policy algorithms provably learn to recover from their initially suboptimal actions, while off-policy methods naively repeat the past action.


2022_minimax

Minimax optimal online imitation learning via replay estimation
Gokul Swamy, Nived Rajaraman, Matt Peng, Sanjiban Choudhury, J Bagnell, Steven Z Wu, Jiantao Jiao, Kannan Ramchandran
Advances in Neural Information Processing Systems (NeurIPS), 2022
paper

Imitation learning from noisy experts leads to biased policies! Replay estimation fixes this by smoothing the expert by repeatedly executing cached expert actions in a stochastic simulator and imitating that.


2022_sequence_il

Towards Uniformly Superhuman Autonomy via Subdominance Minimization
Brian Ziebart, Sanjiban Choudhury, Xinyan Yan, and Paul Vernaza
International Conference on Machine Learning (ICML), 2022
paper

We look at imitation learning where the demonstrators have varying quality and seek to induce behavior that is unambiguously better (i.e., Pareto dominant or minimally subdominant) than all human demonstrations.


2021_moment_matching

Of Moments and Matching: Trade-offs and Treatments in Imitation Learning
Gokul Swamy, Sanjiban Choudhury, Zhiwei Steven Wu, and J Andrew Bagnell
International Conference on Machine Learning (ICML), 2021
paper / website /

All of imitation learning can be reduced to a game between a learner (generator) and a value function (discriminator) where the payoff is the performance difference between learner and expert.


2021_blended_mpc

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning
Mohak Bhardwaj, Sanjiban Choudhury, and Byron Boots
International Conference on Learning Representations (ICLR), 2021
paper

Blend model predictive control (MPC) with learned value estimates to trade-off MPC model errors with learner approximation errors.


2021_feedback_il

Feedback in Imitation Learning: The Three Regimes of Covariate Shift
Jonathan Spencer, Sanjiban Choudhury, Arun Venkatraman, Brian Ziebart, and J Andrew Bagnell
arXiv preprint arXiv:2102.02872, 2021
paper / talk

Not all imitation learning problems are alike -- some are easy (do behavior cloning), some are hard (call interactive expert), and some are just right (just need a simulator).


2021_brpo

Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts
Gilwoo Lee, Brian Hou, Sanjiban Choudhury and Siddhartha S. Srinivasa
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021
paper / talk

In Bayesian RL, while solving the belief MDP is hard, solving individual latent MDP is easy. Combine value functions from each MDP along with a learned residual belief policy.


2019_fdiv_il

Imitation Learning as f-Divergence Minimization
Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee and Siddhartha Srinivasa
Workshop on the Algorithmic Foundations of Robotics (WAFR), 2020
paper

Many old (and new!) imitation learning algorithms are simply minimizing various f-divergences estimates between the expert and the learner trajectory distributions.


2019_eil

Learning from Interventions: Human-robot interaction as both explicit and implicit feedback
Jonathan Spencer, Sanjiban Choudhury, Matt Barnes and Siddhartha Srinivasa
Robotics: Science and Systems (RSS), 2020
paper / talk

How can we learn from human interventions? Every intervention reveals some information about expert's implicit value function. Infer this function and optimize it.


2021_coactive

Learning Online from Corrective Feedback: A Meta-Algorithm for Robotics
Matthew Schmittle, Sanjiban Choudhury, and Siddhartha Srinivasa
arXiv preprint arXiv:2104.01021, 2020
paper

We can model multi-modal feedback from human (demonstrations, interventions, verbal) as a stream of losses that can be minimized using any no-regret online learning algorithm.