Agent-Driver

Abstract

Human-level driving is an essential goal of autonomous driving. Conventional approaches formulate autonomous driving as a perception-prediction-planning framework, yet their systems do not capitalize on the inherent reasoning ability and experiential knowledge of humans. In this paper, we propose a fundamental paradigm shift from current pipelines, exploiting Large Language Models (LLMs) as a cognitive agent to integrate human-like intelligence into autonomous driving systems. Our system, termed Agent-Driver, transforms the traditional autonomous driving pipeline by introducing a versatile tool library accessible via function calls, a cognitive memory of common sense and experiential knowledge for decision-making, and a reasoning engine capable of chain-of-thought reasoning, task planning, motion planning, and self-reflection. Powered by LLMs, our Agent-Driver is endowed with intuitive common sense and robust reasoning capabilities, thus enabling a more nuanced, human-like approach to autonomous driving. We evaluate our system on the large-scale nuScenes benchmark, and extensive experiments substantiate that our Agent-Driver significantly outperforms the state-of-the-art driving methods by a large margin. Our approach also demonstrates superior interpretability and few-shot learning ability to these methods.

Method

We present Agent-Driver, an LLM-powered agent that revolutionizes the traditional perception-prediction-planning framework, establishing a powerful yet flexible paradigm for human-like autonomous driving.

Agent-Driver integrates a tool library for dynamic perception and prediction, a cognitive memory for human knowledge, and a reasoning engine that emulates human decision-making, all orchestrated by LLMs to enable a more anthropomorphic autonomous driving process.

Agent-Driver significantly outperforms the state-of-the-art autonomous driving systems by a large margin, with over 30% collision improvements in motion planning. Our approach also demonstrates strong few-shot learning ability and interpretability on the nuScenes benchmark.

We provide a variety range of ablation study to dissect the proposed architecture and understand the efficacy of each module, to facilitate future research in this direction.

Illustration of function calls in the tool library.

Illustration of memory search.

Illustration of reasoning engine.

Demos

BibTeX

@article{mao2023agentdriver,
  author = {Mao, Jiageng and Ye, Junjie and Qian, Yuxi and Pavone, Marco and Wang, Yue},
  title = {A Language Agent for Autonomous Driving},
  year = {2023},
}

A Language Agent for Autonomous Driving

Agent-Driver transforms the conventional perception-prediction-planning framework by introducing LLMs as an agent for autonomous driving.