Deep Reinforcement and IL for Autonomous Driving: A Review in the CARLA Simulation Environment
Deep Reinforcement and IL for Autonomous Driving: A Review in the CARLA Simulation Environment
Autonomous driving is a complex and fast-evolving domain at the intersection of robotics, machine learning, and control systems. This paper provides a systematic review of recent developments in reinforcement learning and imitation learning approaches for autonomous vehicle control, with a dedicated focus on the CARLA simulator, an open-source, high-fidelity platform that has become a standard for learning-based autonomous vehicle research. We analyze reinforcement learning-based and imitation learning-based studies, extracting and comparing their formulations of state, action, and reward spaces. Special attention is given to the design of reward functions, control architectures, and integration pipelines. Comparative graphs and diagrams illustrate performance trade-offs. We further highlight gaps in generalization to real-world driving scenarios, robustness under dynamic environments, and scalability of agent architectures. Despite rapid progress, existing autonomous driving systems exhibit significant limitations. For instance, studies show that end-to-end reinforcement learning models can suffer from performance degradation of up to thirty-five percent when exposed to unseen weather or town conditions, and imitation learning agents trained solely on expert demonstrations exhibit up to forty percent higher collision rates in novel environments. Furthermore, reward misspecification remains a critical issue-over twenty percent of reported failures in simulated environments stem from poorly calibrated reward signals. Generalization gaps, especially in reinforcement learning, also manifest in task-specific overfitting, with agents failing up to sixty percent of the time when faced with dynamic obstacles not encountered during training. These persistent shortcomings underscore the need for more robust and sample-efficient learning strategies. Finally, we discuss hybrid paradigms that integrate imitation learning and reinforcement learning, such as Generative Adversarial Imitation Learning, and propose future research directions.
One. Introduction
One. Introduction
In recent years, autonomous driving has evolved into a complex and rapidly progressing research area. As an inherently multidisciplinary challenge, it brings together elements of robotics, computer vision, control theory, embedded systems, and artificial intelligence. While many architectural decompositions are possible, for clarity, we distinguish five core functional domains in autonomous vehicles: perception, localization, planning, control, and system-level management. These components do not operate in isolation-they are highly interdependent and must be tightly integrated to achieve safe and reliable autonomous behavior. For instance, accurate localization relies on perception, planning depends on both localization and perception, and control must faithfully execute decisions made by the planning module. This functional breakdown serves as a foundation for understanding where and how reinforcement learning and imitation learning techniques can be applied effectively.
Perception in autonomous vehicles encompasses the capabilities that allow the system to interpret its surroundings using raw sensor data from cameras, LiDAR, radar, and ultrasonic sensors. It processes this input to detect and classify objects, recognize lanes and traffic signals, and assess environmental conditions such as lighting and weather. The output is a structured, machine-readable representation of the environment, which serves as a vital input to the localization, planning, and decision-making modules. To be effective, perception must operate in real time and remain robust under diverse and challenging conditions, including occlusions, sensor noise, and adverse weather. Localization enables the vehicle to determine its precise position and orientation within a known or partially known environment by fusing data from GPS, inertial measurement units, LiDAR, and cameras, often using techniques such as simultaneous localization and mapping or map-matching algorithms. Accurate localization is essential for safe navigation and must remain reliable even in GPS-denied areas, during sensor drift, or when landmarks are temporarily obscured. Planning is responsible for determining the vehicle's future actions and is typically divided into two levels: behavior planning, which selects high-level maneuvers such as lane changes or yielding, and motion planning, which computes detailed, dynamically feasible trajectories. These plans depend heavily on accurate inputs from perception and localization to ensure obstacle avoidance, compliance with traffic laws, and passenger comfort, while adapting continuously to a changing environment. Control translates these planned trajectories into low-level commands for steering, throttle, and braking, aiming to follow the desired path with precision, stability, and responsiveness. It commonly employs methods such as PID controllers, model predictive control, or learning-based strategies, and must operate at high frequency while being resilient to delays, disturbances, and variations in vehicle dynamics. Overseeing all these components is the system management layer, which ensures reliable and safe operation by coordinating tasks, allocating computational resources, monitoring system health, detecting faults, and managing communication between modules. It also provides fallback strategies in case of failures, making it a critical backbone for production-grade autonomous driving systems.
This work focuses primarily on the planning and control components of autonomous driving, where reinforcement learning and imitation learning have demonstrated the greatest practical relevance. In these domains, learning-based agents are trained to make tactical decisions and execute low-level maneuvers based on sensor-derived representations of the environment. Approaches vary from end-to-end policies that directly map observations to actions, to modular designs where planning and control are learned separately. In contrast, components such as perception and localization are typically treated as fixed inputs-often provided by the simulator in the form of semantic maps, LiDAR projections, or ground-truth poses-and are not the target of learning. Similarly, system-level management is generally handled using conventional software architectures and falls outside the scope of learning.
Among the range of machine learning techniques applied in this domain, reinforcement learning-exemplified by algorithms such as Proximal Policy Optimization and Deep Q Networks-has emerged as one of the most promising approaches. Consequently, the field of reinforcement learning continues to advance at a remarkable pace, with increasingly sophisticated methods producing superior performance across a growing spectrum of applications.
This review addresses reinforcement learning and imitation learning strategies implemented in the CARLA simulation platform. CARLA is uniquely suited to this task due to its open-source nature, full-stack simulation capabilities, and strong support for AI integration, making it one of the most widely used platforms for academic research on learning-based autonomous vehicle agents. CARLA is a widely adopted high-fidelity simulator for autonomous driving research that offers extensive support for sensor realism, weather variability, and urban driving scenarios. Reviewing the state-of-the-art literature helps identify prevailing agent architectures, design decisions, and gaps that may inform the development of improved driving agents.
The technical analysis in this paper is grounded in representative reinforcement learning-based studies. For each work, we extract the definitions of the agent's state and action spaces and examine the structure and effectiveness of the associated reward function. The studies, though diverse in their technical formulations, share a common foundation: the use of reinforcement learning in the CARLA environment for autonomous navigation and control. We synthesize the insights from these papers to compare strategies, identify trade-offs, and understand how varying model architectures impact policy learning and performance.
Additionally, three studies present imitation learning-based approaches that do not rely on environmental rewards but instead learn behaviors by mimicking expert demonstrations. These imitation learning methods are assessed in parallel to the reinforcement learning models to identify complementary strengths and limitations. Their inclusion is critical, as imitation learning is increasingly integrated with reinforcement learning to enhance learning efficiency, policy generalization, and safety, especially in data-constrained or high-risk environments.
Beyond these core studies, recent research has pushed the boundaries of RL in autonomous driving further. For example, Sakhai and Wielgosz proposed an end-to-end escape framework for complex urban settings using RL, while Kołomański et al. extended the paradigm to pursuit-based driving, emphasizing policy adaptability in adversarial contexts. Furthermore, Sakhai et al. explored biologically inspired neural models for real-time pedestrian detection using spiking neural networks and dynamic vision sensors in simulated adverse weather conditions. Recent studies have also explored the robustness of AV sensor systems against cyber threats. Notably, Sakhai et al. conducted a comprehensive evaluation of RGB cameras and Dynamic Vision Sensors within the CARLA simulator, demonstrating that Dynamic Vision Sensors exhibit enhanced resilience to various cyberattack vectors compared to traditional RGB sensors. These contributions showcase the evolving versatility of RL architectures and their capacity to address safety-critical tasks under uncertainty.
Despite the rapid progress in learning-based autonomous driving, existing systems face several persistent challenges. Studies have shown that end-to-end RL agents often suffer from significant performance degradation-up to thirty-five percent-when deployed in conditions that differ from their training environments, such as novel towns or adverse weather. Transfer RL has been explored as a mitigation for such weather-induced distribution shifts. Similarly, imitation learning agents trained solely on expert demonstrations have demonstrated up to forty percent higher collision rates when exposed to previously unseen driving contexts. Moreover, a key difficulty in RL lies in the careful tuning of reward functions; poorly calibrated rewards have been linked to over twenty percent of policy failures in simulation settings. Finally, RL agents often overfit to narrow task domains, with generalization failures reaching up to sixty percent when encountering dynamic obstacles not seen during training. These limitations underscore the need for more robust, transferable, and sample-efficient learning frameworks that integrate the strengths of both RL and imitation learning paradigms.
In general, this review aims to provide a comprehensive synthesis of RL and imitation learning research in the CARLA simulator, highlighting how these learning paradigms contribute to the development of intelligent autonomous driving systems. Compared to widely cited reviews such as, our work provides a more detailed and structured analysis of reinforcement learning methods specifically in the context of autonomous driving control within simulation environments. While previous surveys focus largely on algorithmic taxonomies or general architectural roles, we examine RL approaches through a set of concrete implementation descriptors: state representation, action space, and reward design. This framing enables direct comparison of control strategies and learning objectives across studies, which is largely missing from earlier work. We also include visual summaries of reward functions and highlight differences in how these functions guide policy learning. Furthermore, our focus on the CARLA simulator as a common experimental platform allows for a more consistent and grounded discussion of evaluation strategies and training setups, bridging the gap between theory and practical deployment.