Query-Efficient Planning with Language Models

Abstract

Planning in complex environments requires an agent to efficiently query a world model to find a feasible sequence of actions from start to goal. Recent work has shown that Large Language Models (LLMs), with their rich prior knowledge and reasoning capabilities, can potentially help with planning by searching over promising states and adapting to feedback from the world.

In this paper, we propose and study two fundamentally competing frameworks that leverage LLMs for query-efficient planning. The first uses LLMs as a heuristic within a search-based planner to select promising nodes to expand and propose promising actions. The second uses LLMs as a generative planner to propose an entire sequence of actions, query the world model, and adapt based on feedback.

We show that while both approaches improve upon comparable baselines, using an LLM as a generative planner results in significantly fewer world model queries. Our key finding is that the LLM as a planner can more rapidly adapt its planning strategies based on immediate feedback than LLM as a heuristic. We present evaluations and ablations on Robotouille and PDDL planning benchmarks and discuss connections to existing theory on query-efficient planning algorithms.

Overview

ReAct prompting -- where a policy takes as input the current observation and past history of interactions and then outputs an action -- serves as a simple recipe for query-efficient planning with LLMs; however, ReAct policies are susceptible to getting trapped in local optima. A more flexible approach, ReAct-Select, outputs a state and an action which provides the ability to escape local optima yet uses a larger decision space making it intractable to use.

Tree of Interaction utilizes LLMs as a heuristic within an external planner. The planner maintains a search tree and invokes the LLM to choose which states to expand and what actions to propose. By judiciously choosing which states to expand, the LLM minimizes unnecessary queries to the world model to guide the search tree towards the goal. We implement a beam search planner using best-first search (ToI-BFS) and a depth-first search planner (ToI-DFS).

Instead of restricting the use of LLM agents within an external planner, Boomerang uses an LLM as a generative planner that adapts to feedback from the world model. The LLM planner proposes a sequence of actions from the start to the goal state based on its internal world model's understanding of the problem domain. Then, it receives feedback from the true world model, which in turn updates the internal world model.

Paper

BibTex

          @misc{
gonzalezpumariega2024queryefficientplanninglanguagemodels,
              title={Query-Efficient Planning with Language Models}, 
              author={Gonzalo Gonzalez-Pumariega and Wayne Chen and Kushal Kedia and Sanjiban Choudhury},
              year={2024},
              eprint={2412.06162},
              archivePrefix={arXiv},
              primaryClass={cs.AI},
              url={https://arxiv.org/abs/2412.06162}, 
        }

Query-Efficient Planning with Language Models

Abstract

Overview

Results

Paper

BibTex

Query-Efficient Planning with
Language Models