A Comparative Study of Deep Reinforcement Learning Algorithms for Urban Autonomous Driving: Addressing the Geographic and Regulatory Challenges in CARLA

100%

5ewi-2026-02-08_01_54_47-applsci-15-06838.pdf

Abstract: To enable autonomous driving in real-world environments that involve a diverse range of geographic variations and complex traffic regulations, it is essential to investigate Deep Reinforcement Learning algorithms capable of policy learning in high-dimensional environments characterized by intricate state-action interactions. In particular, closed-loop experiments, which involve continuous interaction between an agent and their driving environment, serve as a critical framework for improving the practical applicability of Deep Reinforcement Learning algorithms in autonomous driving systems. This study empirically analyzes the capabilities of several representative Deep Reinforcement Learning algorithms-namely DDPG, SAC, TD3, PPO, TQC, and CrossQ-in handling various urban driving scenarios using the CARLA simulator within a closed-loop framework. To evaluate the adaptability of each algorithm to geographical variability and complex traffic laws, scenario-specific reward and penalty functions were carefully designed and incorporated. For a comprehensive performance assessment of the Deep Reinforcement Learning algorithms, we defined several driving performance metrics, including Route Completion, Centerline Deviation Mean, Episode Reward Mean, and Success Rate, which collectively reflect the quality of the driving in terms of its completeness, stability, efficiency, and comfort. Experimental results demonstrate that TQC and SAC, both of which adopt off-policy learning and stochastic policies, achieve superior sample efficiency and learning performances. Notably, the presence of geographically variant features-such as traffic lights, intersections, and roundabouts-and their associated traffic rules within a given town pose significant challenges to driving performance, particularly in terms of Route Completion, Success Rate, and lane-keeping stability. In these challenging scenarios, the TQC algorithm achieved a Route Completion rate of zero point nine one, substantially outperforming the zero point two three rate observed with DDPG. This performance gap highlights the advantage of approaches like TQC and SAC, which address Q-value overestimation through statistical methods, in improving the robustness and effectiveness of autonomous driving in diverse urban environments.

One. Introduction

Autonomous driving technology enables vehicles to perceive their surroundings using a variety of sensors, such as cameras and LiDAR, and to navigate to a designated destination without human intervention by executing real-time control operations including steering, braking, and lane keeping. Traditional rule-based decision-making systems have exhibited limitations in adapting to novel or unpredictable scenarios. However, with the advancement of artificial intelligence techniques and the accumulation of large-scale datasets, autonomous driving systems have demonstrated increasingly robust driving performances across diverse environments, progressing steadily toward becoming fully autonomous systems. In particular, perception-related modules, such as Bird's Eye View segmentation, lane detection, three-D object detection, depth estimation, and multi-task learning frameworks, have been significantly enhanced through recent advances in deep learning. Moreover, there has been growing interest in the integration of large language models and multi-modal large language models, which aim to enrich scene understanding and enable high-level reasoning in autonomous systems. Despite the remarkable progress in perception technologies, achieving full autonomy still necessitates moving beyond rule-based strategies in the decision-making domain. It requires adaptive and flexible learning mechanisms grounded in artificial intelligence to cope with the complexity and uncertainty of real-world driving environments.

To address the challenges of decision-making in autonomous driving systems, recent research has increasingly focused on using various reinforcement learning techniques. In particular, imitation learning and Deep Reinforcement Learning have emerged as core paradigms for learning decision policies. Imitation learning refers to the approach of learning a driving policy directly from human-expert demonstrations. While imitation learning enables the rapid acquisition of a competent initial performance, it has been widely adopted due to its simplicity-as it often bypasses technically demanding components such as explicit reward function design and stabilization mechanisms during policy training. However, collecting comprehensive driving data for all possible scenarios from human experts is practically infeasible. As a result, imitation learning-based approaches often suffer from generalization and scalability limitations due to issues such as covariate shift, error compounding, distributional drift, and recovery failure. In contrast, Deep Reinforcement Learning offers a more flexible learning framework, where agents autonomously learn policies by interacting with the environment and maximizing cumulative rewards. This approach eliminates the need for explicitly labeled expert data and allows the agent to learn from diverse scenarios in both simulation and real-world settings, thus addressing several key limitations of imitation-based learning. Deep Reinforcement Learning therefore represents a promising direction for advancing the decision-making capabilities of autonomous driving systems.

Efforts to integrate Deep Reinforcement Learning into autonomous driving have evolved in parallel with the emergence and advancement of Deep Reinforcement Learning techniques. In its early stages, research was primarily driven by the success of AlphaGo, which brought significant attention to value-based methods, and particularly Deep Q-Networks. However, since a Deep Q-Network is inherently designed for discrete action spaces, its applicability in real-world driving scenarios, which typically involve continuous control, was limited. This led to the development of Deep Reinforcement Learning algorithms capable of operating in continuous action spaces. Notably, several attempts were made to adapt policy-optimization-based algorithms, such as Asynchronous Advantage Actor-Critic, Proximal Policy Optimization, and Trust Region Policy Optimization, to autonomous driving domains. Building upon these advances, actor-critic architectures that facilitate more stable policy learning were introduced. These include the Deep Deterministic Policy Gradient, Twin Delayed Deep Reinforcement Learning, and Soft Actor-Critic algorithms, which have since been widely applied in autonomous driving research. More recently, algorithms that further mitigate the problem of Q-value overestimation have been proposed, including CrossQ and Truncated Quantile Critics, which represent the latest advancements in Deep Reinforcement Learning suitable for high-dimensional continuous control environments such as autonomous driving.

cated Quantile Critics, which represent the latest advancements in Deep Reinforcement Learning suitable for high-dimensional continuous control environments such as autonomous driving.

Despite the notable advancements and stabilization of Deep Reinforcement Learning algorithms, several challenges remain in directly deploying these methods in real-world autonomous driving systems. Due to the inherently trial-and-error nature of reinforcement learning, training agents in real vehicles raises critical safety concerns. Moreover, collecting large-scale datasets of diverse driving scenarios requires substantial time and costs, posing a major obstacle to their practical deployment. To overcome these limitations, numerous studies have explored a progressive deployment strategy wherein the initial performance of decision-making algorithms is validated in simulation-based closed-loop environments such as CARLA, followed by fine-tuning in real-world scenarios. This approach provides a critical technical foundation for enabling real-time decision-making in diverse geographical environments-such as intersections, roundabouts, pedestrian crossings, highways, and tunnels-while adhering to traffic regulations including traffic lights, stop signs, and speed limits. In particular, the CARLA simulation environment offers a rich suite of maps representative of urban driving, an extensive collection of supporting libraries, customizable vehicles, and sensor deployment capabilities. Most notably, it benefits from a well-developed technological ecosystem that facilitates comprehensive experimentation for autonomous driving research. The ability to effectively simulate diverse driving scenarios and systematically compare the performance of autonomous agents in simulations is an essential step toward ensuring the practicality and safety of Deep Reinforcement Learning-based systems in real-world deployments. In line with this need, recent research has undertaken comparative and analytical evaluations of Deep Reinforcement Learning algorithms for autonomous driving within simulation environments.

However, existing studies have not sufficiently incorporated a wide range of state-of-the-art DRL algorithms into these assessments, and many lack realistic urban environmental elements such as traffic lights and road infrastructure. In particular, prior research often applies DRL algorithms in isolation, without conducting a comprehensive comparative analysis under unified evaluation criteria. For example, the study compares DQN and DDPG using the CARLA simulator, evaluates DQN and PPO, and focuses solely on a comparison between DQN and DDPG. These fragmented efforts limit the generalizability and depth of our insight into the relative performance of modern DRL algorithms for autonomous driving. To address this gap, the present study applies six representative DRL algorithms-DDPG, PPO, SAC, TD3, CrossQ, and TQC-to urban autonomous driving scenarios in the CARLA simulator. These scenarios include complex geographic environments such as intersections, roundabouts, crosswalks, highways, and tunnels, while incorporating real-world traffic rules including traffic lights, stop signs, and speed limit compliance. This study presents an in-depth evaluation of each algorithm's driving performance, the effectiveness of its associated reward functions, and a comparative analysis of their Success Rates across diverse urban driving conditions.

5ewi-2026-02-08_01_54_47-applsci-15-06838.pdf

One. Introduction

Two. Related Works

Three. System Model

Three point two. DRL Module

Three point three. VAE Encoder

Three point four. Reward Function

Three point four point one. Reward

Vego Vmin (twenty-six)

Three point four point two. Penalty

Three point four point three. Total Reward Function

Four. Evaluation Metrics

Five. Experimental Setup

Five point two. Hyperparameter Setup for DRL Training

Six. Simulation Result

Six point one. Evaluation of Driving Performance

Six point two. Evaluation of Reward-Based Performances

Six point three. Evaluation of Driving Stability, Efficiency, and Comfort

Six point four. Analysis of Penalty and Failure Factors

Seven. Conclusions

Eight. Future Work

KarGO: A Smarter Mobile Platform for Tricycle Transportation

KarGO: A Smarter Transportation Solution for Tricycles

KarGO: A Smarter Way to Move Your Community

Introducing KarGO: A Smarter Transportation Solution for Tricyle Services

Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment