Research
Interactive Imitation Learning
Often times, explicitly programming a robot can be very challenging. Imitation learning offers a more scalable option of implicitly programming robots through demonstrations, interventions or preferences. While we currently have simple algorithms with strong theory, they rely on a set of restrictive assumptions. Notably, an optimal human expert who can interactively provide corrections on any state the robot visits. An everyday human user, however, presents a number of challenges:
-
Mismatched capabilities: A user may demonstrate a task that is beyond the robot’s capabilities. Or, they may not be as efficient as a robot. Can we learn under such mismatch?
-
Unobserved contexts: Human feedback is often influenced by latent context (e.g. trust, attention) that the robot does not directly observe. How do we deal with such uncertainty?
-
Natural feedback: Can we learn from natural modes of human feedback like language and gestures?
|
Of Moments and Matching: Trade-offs and Treatments in Imitation Learning
Gokul Swamy, Sanjiban Choudhury, Zhiwei Steven Wu, and J Andrew Bagnell International Conference on Machine Learning (ICML), 2021 project page / paper / video / code All of imitation learning can be reduced to a game between a learner (generator) and a value function (discriminator) where the payoff is the performance difference between learner and expert. |
|
Feedback in Imitation Learning: The Three Regimes of Covariate Shift
Jonathan Spencer, Sanjiban Choudhury, Arun Venkatraman, Brian Ziebart, and J Andrew Bagnell arXiv preprint arXiv:2102.02872, 2021 paper / talk Not all imitation learning problems are alike -- some are easy (do behavior cloning), some are hard (call interactive expert), and some are just right (just need a simulator). |
|
Learning from Interventions: Human-robot interaction as both explicit and implicit feedback
Jonathan Spencer, Sanjiban Choudhury, Matt Barnes and Siddhartha Srinivasa Robotics: Science and Systems (RSS), 2020 paper / talk How can we learn from human interventions? Every intervention reveals some information about expert's implicit value function. Infer this function and optimize it. |
|
Imitation Learning as f-Divergence Minimization
Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee and Siddhartha Srinivasa Workshop on the Algorithmic Foundations of Robotics (WAFR), 2020 paper Many old (and new!) imitation learning algorithms are simply minimizing various f-divergences estimates between the expert and the learner trajectory distributions. |
Task Representation Learning
We want personal robots that come with a repertoire of skills that can be composed to solve any boutique task in our homes. Every home is different, every human is different. Robots must be able to learn new tasks from a handful of demonstrations and interactions with the human user. Abstractly, one can think of a task as – given an initial configuration of objects, reach a desired goal configuration following a series of feasible operations that do not violate constraints. Learning both the goal and constraints comes with a number of challenges:
-
Skill Composition: Given a library of skills, can we label and learn demonstrated tasks as a composition of known skills?
-
Limited Labels: Getting large amounts of labelled data for every new task can be expensive. How can we leverage unlabelled data collected from multiple tasks that share sub-tasks in common?
-
Learning Structure: Instead of memorizing a task as a sequence of operations, can we learn the underlying structure of the task, i.e. common sub-goals, invariances and dependencies?
Planning Alongside Humans
For robots to work seamlessly alongside human partners, they must operate in a safe and legible manner while matching human cadence. The fundamental challenge is uncertainty. Robots are uncertain about the intent of their human partners, and how this intent changes based on the robot’s actions. Modelling this uncertainty and planning with it in real-time presents a set of challenges:
-
Discrete modes: Theoretically, planning under uncertainty over continuous spaces is intractable. But humans do this everyday by chunking up the continuous space of actions into discrete modes. What are these modes and how do we learn them?
-
Hedging v/s Asserting: In much of driving, humans fluidly trade-off between hedging against uncertain outcomes and assertive actions that collapse uncertainty. How do we learn this trade-off and can we generalize this broadly across human robot interactions?
-
Hierarchy of Abstractions: Robots must deal with uncertainty at multiple time-scales. How can we learn a hierarchy of plannable abstractions to continually build and refine an estimate of the value function?
|
Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts
Gilwoo Lee, Brian Hou, Sanjiban Choudhury and Siddhartha S. Srinivasa IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021 paper / talk In Bayesian RL, while solving the belief MDP is hard, solving individual latent MDP is easy. Combine value functions from each MDP along with a learned residual belief policy. |
|
Posterior Sampling for Anytime Motion Planning on Graphs with Expensive-to-Evaluate Edges
Brian Hou, Sanjiban Choudhury, Gilwoo Lee, Aditya Mandalika, and Siddhartha Srinivasa IEEE International Conference on Robotics and Automation (ICRA), 2020 paper / video Anytime motion planning can be viewed through a Bayesian lens where we are initially uncertain about the shortest path, and must probe the environment to progressively yield shorter and shorter paths. |
|
Generalized Lazy Search for Robot Motion Planning: Interleaving Search and Edge Evaluation via Event-based Toggles
Aditya Mandalika, Sanjiban Choudhury, Oren Salzman and Siddhartha Srinivasa International Conference on Automated Planning and Scheduling (ICAPS), 2019 Best Student Paper Award paper / long paper Unified framework for interleaving search and edge evaluation to provably minimize total planning time. |
|
The Blindfolded Robot : A Bayesian Approach to Planning with Contact Feedback
Brad Saund, Sanjiban Choudhury, Siddhartha Srinivasa, and Dmitry Berenson. International Symposium on Robotics Research (ISRR), 2019 paper / video Casts manipulation under occlusion as a search on a graph where feasibility of an edge is only revealed when an agent attempts to traverse it. Use Bayesian prior to explore exploit. |