APRICOT: Active Preference Learning and
Constraint-Aware Task Planning with LLMs

CoRL 2024

Yuki (Huaxiaoyue) Wang, Nathaniel Chin, Gonzalo Gonzalez-Pumariega, Xiangwan Sun, Neha Sunkara, Maximus Adrian Pace, Jeannette Bohg, Sanjiban Choudhury

Paper arXiv Summary

APRICOT queries users to learn their preferences and generates plans that balance preferences and constraints.

APRICOT Overview

APRICOT (1) converts user visual demonstrations into language-based demonstrations; (2) given demonstrations, determines the preference that best approximates the ground-truth user preference by minimally querying the user; (3) generates and refines a plan based on world models' feedback to satisfy preferences and respect constraints; (4) executes the plan in a real robot system.

Abstract

Home robots performing personalized tasks must adeptly balance user preferences with environmental affordances. We focus on organization tasks within constrained spaces, such as arranging items into a refrigerator, where preferences for placement collide with physical limitations. The robot must infer user preferences based on a small set of demonstrations, which is easier for users to provide than extensively defining all their requirements. While recent works use Large Language Models (LLMs) to learn preferences from user demonstrations, they encounter two fundamental challenges. First, there is inherent ambiguity in interpreting user actions, as multiple preferences can often explain a single observed behavior. Second, not all user preferences are practically feasible due to geometric constraints in the environment. To address these challenges, we introduce APRICOT, a novel approach that merges LLM-based Bayesian active preference learning with constraint-aware task planning. APRICOT refines its generated preferences by actively querying the user and dynamically adapts its plan to respect environmental constraints. We evaluate APRICOT on a dataset of diverse organization tasks and demonstrate its effectiveness in real-world scenarios, showing significant improvements in both preference satisfaction and plan feasibility.

Key Challenges

APRICOT tackles personalized tasks in constrained settings, such as arranging groceries in a fridge after observing a user's behavior. When the user provides visual demonstration to define the task, two key challenges arise:
(1) User demonstrations alone can't collapse uncertainties about preferences. (2) User preferences can often conflict with environmental affordances.

LLM-Based Bayesian Active Preference Learning

To refine preferences learned from user demonstrations, APRICOT combines the generative ability of LLMs with Bayesian active preference learning. Concretely, APRICOT
(1) proposes candidate preferences and corresponding candidate plans via an LLM,
(2) evaluate whether to terminate based on whether the prior over preferences is sufficient,
(3) select the optimal question that maximizes information gain.

Simulation Results

Active Preference Learning Results on Benchmark Dataset. APRICOT achieves the highest preference accuracy 58%, which is the percentage of outputted preferences that are equivalent to the ground-truth preference, while asking the user the smallest amount of questions (2.15 on average).

Real Robot Results

Task Planner Results on Real-Robot Scenarios. Evaluated on 9 scenarios with 3 difficulty levels. The qualitative example below APRICOT generating a plan that satisfies preferences and respect constraints.

Real Robot Montage

Paper

Paper thumbnail

BibTex


  @inproceedings{
    wang2024apricot,
    title={{APRICOT}: Active Preference Learning and Constraint-Aware Task Planning with {LLM}s},
    author={Huaxiaoyue Wang and Nathaniel Chin and Gonzalo Gonzalez-Pumariega and Xiangwan Sun and Neha Sunkara and Maximus Adrian Pace and Jeannette Bohg and Sanjiban Choudhury},
    booktitle={8th Annual Conference on Robot Learning},
    year={2024},
    url={https://openreview.net/forum?id=nQslM6f7dW}
    }