APRICOT (1) converts user visual demonstrations into language-based demonstrations; (2) given demonstrations, determines the preference that best approximates the ground-truth user preference by minimally querying the user; (3) generates and refines a plan based on world models' feedback to satisfy preferences and respect constraints; (4) executes the plan in a real robot system.
Home robots performing personalized tasks must adeptly balance user preferences with environmental affordances. We focus on organization tasks within constrained spaces, such as arranging items into a refrigerator, where preferences for placement collide with physical limitations. The robot must infer user preferences based on a small set of demonstrations, which is easier for users to provide than extensively defining all their requirements. While recent works use Large Language Models (LLMs) to learn preferences from user demonstrations, they encounter two fundamental challenges. First, there is inherent ambiguity in interpreting user actions, as multiple preferences can often explain a single observed behavior. Second, not all user preferences are practically feasible due to geometric constraints in the environment. To address these challenges, we introduce APRICOT, a novel approach that merges LLM-based Bayesian active preference learning with constraint-aware task planning. APRICOT refines its generated preferences by actively querying the user and dynamically adapts its plan to respect environmental constraints. We evaluate APRICOT on a dataset of diverse organization tasks and demonstrate its effectiveness in real-world scenarios, showing significant improvements in both preference satisfaction and plan feasibility.
APRICOT tackles personalized tasks in constrained settings, such as arranging groceries in a fridge after observing a user's behavior. When the user provides visual demonstration to define the task, two key challenges arise:
(1) User demonstrations alone can't collapse uncertainties about preferences.
(2) User preferences can often conflict with environmental affordances.
To refine preferences learned from user demonstrations, APRICOT combines the generative ability of LLMs with Bayesian active preference learning. Concretely, APRICOT
(1) proposes candidate preferences and corresponding candidate plans via an LLM,
(2) evaluate whether to terminate based on whether the prior over preferences is sufficient,
(3) select the optimal question that maximizes information gain.
Active Preference Learning Results on Benchmark Dataset. APRICOT achieves the highest preference accuracy 58%, which is the percentage of outputted preferences that are equivalent to the ground-truth preference, while asking the user the smallest amount of questions (2.15 on average).
Task Planner Results on Real-Robot Scenarios. Evaluated on 9 scenarios with 3 difficulty levels. The qualitative example below APRICOT generating a plan that satisfies preferences and respect constraints.