Interactive Imitation Learning

Often times, explicitly programming a robot can be very challenging. Imitation learning offers a more scalable option of implicitly programming robots through demonstrations, interventions or preferences. While we currently have simple algorithms with strong theory, they rely on a set of restrictive assumptions. Notably, an optimal human expert who can interactively provide corrections on any state the robot visits. An everyday human user, however, presents a number of challenges:

  1. Mismatched capabilities: A user may demonstrate a task that is beyond the robot’s capabilities. Or, they may not be as efficient as a robot. Can we learn under such mismatch?

  2. Unobserved contexts: Human feedback is often influenced by latent context (e.g. trust, attention) that the robot does not directly observe. How do we deal with such uncertainty?

  3. Natural feedback: Can we learn from natural modes of human feedback like language and gestures?


Of Moments and Matching: Trade-offs and Treatments in Imitation Learning
Gokul Swamy, Sanjiban Choudhury, Zhiwei Steven Wu, and J Andrew Bagnell
International Conference on Machine Learning (ICML), 2021
project page / paper / video / code

All of imitation learning can be reduced to a game between a learner (generator) and a value function (discriminator) where the payoff is the performance difference between learner and expert.


Feedback in Imitation Learning: The Three Regimes of Covariate Shift
Jonathan Spencer, Sanjiban Choudhury, Arun Venkatraman, Brian Ziebart, and J Andrew Bagnell
arXiv preprint arXiv:2102.02872, 2021
paper / talk

Not all imitation learning problems are alike -- some are easy (do behavior cloning), some are hard (call interactive expert), and some are just right (just need a simulator).


Learning from Interventions: Human-robot interaction as both explicit and implicit feedback
Jonathan Spencer, Sanjiban Choudhury, Matt Barnes and Siddhartha Srinivasa
Robotics: Science and Systems (RSS), 2020
paper / talk

How can we learn from human interventions? Every intervention reveals some information about expert's implicit value function. Infer this function and optimize it.


Imitation Learning as f-Divergence Minimization
Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee and Siddhartha Srinivasa
Workshop on the Algorithmic Foundations of Robotics (WAFR), 2020

Many old (and new!) imitation learning algorithms are simply minimizing various f-divergences estimates between the expert and the learner trajectory distributions.

Task Representation Learning

We want personal robots that come with a repertoire of skills that can be composed to solve any boutique task in our homes. Every home is different, every human is different. Robots must be able to learn new tasks from a handful of demonstrations and interactions with the human user. Abstractly, one can think of a task as – given an initial configuration of objects, reach a desired goal configuration following a series of feasible operations that do not violate constraints. Learning both the goal and constraints comes with a number of challenges:

  1. Skill Composition: Given a library of skills, can we label and learn demonstrated tasks as a composition of known skills?

  2. Limited Labels: Getting large amounts of labelled data for every new task can be expensive. How can we leverage unlabelled data collected from multiple tasks that share sub-tasks in common?

  3. Learning Structure: Instead of memorizing a task as a sequence of operations, can we learn the underlying structure of the task, i.e. common sub-goals, invariances and dependencies?

Planning Alongside Humans

For robots to work seamlessly alongside human partners, they must operate in a safe and legible manner while matching human cadence. The fundamental challenge is uncertainty. Robots are uncertain about the intent of their human partners, and how this intent changes based on the robot’s actions. Modelling this uncertainty and planning with it in real-time presents a set of challenges:

  1. Discrete modes: Theoretically, planning under uncertainty over continuous spaces is intractable. But humans do this everyday by chunking up the continuous space of actions into discrete modes. What are these modes and how do we learn them?

  2. Hedging v/s Asserting: In much of driving, humans fluidly trade-off between hedging against uncertain outcomes and assertive actions that collapse uncertainty. How do we learn this trade-off and can we generalize this broadly across human robot interactions?

  3. Hierarchy of Abstractions: Robots must deal with uncertainty at multiple time-scales. How can we learn a hierarchy of plannable abstractions to continually build and refine an estimate of the value function?


Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts
Gilwoo Lee, Brian Hou, Sanjiban Choudhury and Siddhartha S. Srinivasa
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021
paper / talk

In Bayesian RL, while solving the belief MDP is hard, solving individual latent MDP is easy. Combine value functions from each MDP along with a learned residual belief policy.


Posterior Sampling for Anytime Motion Planning on Graphs with Expensive-to-Evaluate Edges
Brian Hou, Sanjiban Choudhury, Gilwoo Lee, Aditya Mandalika, and Siddhartha Srinivasa
IEEE International Conference on Robotics and Automation (ICRA), 2020
paper / video

Anytime motion planning can be viewed through a Bayesian lens where we are initially uncertain about the shortest path, and must probe the environment to progressively yield shorter and shorter paths.


Generalized Lazy Search for Robot Motion Planning: Interleaving Search and Edge Evaluation via Event-based Toggles
Aditya Mandalika, Sanjiban Choudhury, Oren Salzman and Siddhartha Srinivasa
International Conference on Automated Planning and Scheduling (ICAPS), 2019
Best Student Paper Award
paper / long paper

Unified framework for interleaving search and edge evaluation to provably minimize total planning time.


The Blindfolded Robot : A Bayesian Approach to Planning with Contact Feedback
Brad Saund, Sanjiban Choudhury, Siddhartha Srinivasa, and Dmitry Berenson.
International Symposium on Robotics Research (ISRR), 2019
paper / video

Casts manipulation under occlusion as a search on a graph where feasibility of an edge is only revealed when an agent attempts to traverse it. Use Bayesian prior to explore exploit.