The Entity-Deduction Arena: A Playground for Probing the Conversational Reasoning and Planning Capabilities of LLMs
AuthorsYizhe Zhang, Jiarui Lu, Navdeep Jaitly
The Entity-Deduction Arena: A Playground for Probing the Conversational Reasoning and Planning Capabilities of LLMs
AuthorsYizhe Zhang, Jiarui Lu, Navdeep Jaitly
LLMs are currently effective at answering questions that are clearly asked. However, they may encounter difficulties when faced with ambiguous queries. This emphasizes the need for the development of intelligent agents capable of asking clarification questions, which require complex understanding, state tracking, and planning in multi-turn conversations. In this paper, we study a surrogate problem by employing entity-deducing games as evaluation metrics to assess the conversational planning capabilities of different models. We systematically evaluate various LLMs and discover significant performance discrepancies in conversational planning capabilities. Drawing inspiration from Reinforcement Learning from Human Feedback (RLHF), we utilize Reinforcement Learning from Self-Playing (RLSP) on vanilla Vicuna models to enhance planning capacity through self-play in the game. This research offers insights into potential advancements in achieving more intelligent and autonomous agents.
COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
December 11, 2025research area Human-Computer Interaction, research area Speech and Natural Language Processing
Real-world large language model (LLM) agents must master strategic tool use and user preference optimization through multi-turn interactions to assist users with complex planning tasks. We introduce COMPASS (Constrained Optimization through Multi-turn Planning and Strategic Solutions), a benchmark that evaluates agents on realistic travel-planning scenarios. We cast travel planning as a constrained preference optimization problem, where agents…
Towards Learning Multi-Agent Negotiations via Self-Play
January 28, 2019research area Computer VisionWorkshop at ICCV
Making sophisticated, robust, and safe sequential decisions is at the heart of intelligent systems. This is especially critical for planning in complex multi-agent environments, where agents need to anticipate other agents’ intentions and possible future actions. Traditional methods formulate the problem as a Markov Decision Process, but the solutions often rely on various assumptions and become brittle when presented with corner cases. In…