Boomerang
Boomerang
ReAct
ReAct
ToI-BFS
Tree of Interaction BFS


An example run of LLM planners on Blocksworld where a stack of blocks is rearranged into a new stack. Boomerang achieves success while remaining query-efficient. ReAct is query-efficient but cycles over visited states leading to a step limit failure. Tree of Interaction BFS is query-inefficient and cycles excessively.

Abstract

Planning in complex environments requires an agent to efficiently query a world model to find a feasible sequence of actions from start to goal. Recent work has shown that Large Language Models (LLMs), with their rich prior knowledge and reasoning capabilities, can potentially help with planning by searching over promising states and adapting to feedback from the world.

In this paper, we propose and study two fundamentally competing frameworks that leverage LLMs for query-efficient planning. The first uses LLMs as a heuristic within a search-based planner to select promising nodes to expand and propose promising actions. The second uses LLMs as a generative planner to propose an entire sequence of actions, query the world model, and adapt based on feedback.

We show that while both approaches improve upon comparable baselines, using an LLM as a generative planner results in significantly fewer world model queries. Our key finding is that the LLM as a planner can more rapidly adapt its planning strategies based on immediate feedback than LLM as a heuristic. We present evaluations and ablations on Robotouille and PDDL planning benchmarks and discuss connections to existing theory on query-efficient planning algorithms.

Overview

ReAct prompting -- where a policy takes as input the current observation and past history of interactions and then outputs an action -- serves as a simple recipe for query-efficient planning with LLMs; however, ReAct policies are susceptible to getting trapped in local optima. A more flexible approach, ReAct-Select, outputs a state and an action which provides the ability to escape local optima yet uses a larger decision space making it intractable to use.



Tree of Interaction utilizes LLMs as a heuristic within an external planner. The planner maintains a search tree and invokes the LLM to choose which states to expand and what actions to propose. By judiciously choosing which states to expand, the LLM minimizes unnecessary queries to the world model to guide the search tree towards the goal. We implement a beam search planner using best-first search (ToI-BFS) and a depth-first search planner (ToI-DFS).


Instead of restricting the use of LLM agents within an external planner, Boomerang uses an LLM as a generative planner that adapts to feedback from the world model. The LLM planner proposes a sequence of actions from the start to the goal state based on its internal world model's understanding of the problem domain. Then, it receives feedback from the true world model, which in turn updates the internal world model.

Results

We evaluate non-interactive approaches (I/O, I/O + CoT), LLM as heuristic approaches (ToI-BFS, ToI-DFS), classical planners (Classical) and LLM as generative planner approaches (ReAct, Boomerang) on 600 Blocksworld problems in PlanBench. Success is calculated as runs that reach the goal within 20 world model queries. We observe Boomerang achieves the highest success and optimality over query-efficient runs.

We evaluate the best LLM as heuristic, classical, and LLM as generative planner approaches on 100 problems in Logistics, Grippers, and Robotouille and plot cumulative success rates with varying world model queries limits. Classical approaches using FastDownward can only work on PDDL representations which Robotouille lack. Boomerang achieves the highest cumulative success over all domains.

ReAct Failure
ReAct-Select Failure
Boomerang Success

We also qualitatively observe that despite ReAct keeping all interactions in history, it exhibits a recency bias to feedback and endlessly cycles. We note the same phenomenon in ReAct-Select, which exhibits a recency bias to recent states and degenerates to ReAct. Boomerang, on the other hand, reasons over entire trajectories and so can better adapt to feedback.

Paper

BibTex

          @misc{
gonzalezpumariega2024queryefficientplanninglanguagemodels,
              title={Query-Efficient Planning with Language Models}, 
              author={Gonzalo Gonzalez-Pumariega and Wayne Chen and Kushal Kedia and Sanjiban Choudhury},
              year={2024},
              eprint={2412.06162},
              archivePrefix={arXiv},
              primaryClass={cs.AI},
              url={https://arxiv.org/abs/2412.06162}, 
        }