RESEARCH ARTICLE Deep reinforcement learning-based multi-lane mixed traffic ramp merging strategy
RESEARCH ARTICLE Deep reinforcement learning-based multi-lane mixed traffic ramp merging strategy
Abstract
Due to concentrated conflicts, on-ramp merging is an important scenario in the study of new hybrid traffic control. Current research mainly focuses on optimizing the vehicle passage sequence of ramp vehicles merging with mainline vehicles in single-lane scenarios, neglecting the coordination problem of vehicles in multiple mainline lanes. Therefore, an Improved Dueling Double DQN On-ramp Merging Strategy combined with a sine function is proposed, establishing a Vehicle Coordination System to guide the merging of vehicles in multi-lane mainline traffic. This strategy uses the improved D3QN algorithm combined with the excellent smoothness of the sine function to evaluate driving safety, helping vehicles find suitable gaps in traffic flow. An action masking mechanism was deployed during the strategy exploration phase to prevent unsafe actions. The proposed VCS plus IDS strategy was tested in SUMO simulations of on-ramp merging under different density of vehicle flow. Under a traffic flow of one thousand two hundred vehicles per lane per hour, the on-ramp merging completion rate of VCS plus IDS reached ninety-eight point six two percent, and the task completion rate was ninety-eight point one one percent, which increased by eleven point zero eight percent and ten point seven nine percent compared to traditional D3QN, respectively, validating the effectiveness of this method.
One. Introduction
One. Introduction
On-ramp merging is a high-risk road traffic scene that requires merging with a safe speed within a limited distance due to the compulsory nature of the on-ramp merge. The inability of autonomous vehicles to respond in a timely manner to dynamic driving environments, especially in mixed traffic with human-driven vehicles, makes it one of the most challenging decision-making scenarios for autonomous vehicles. The task of autonomous vehicle ramp entry is currently mainly addressed using two methods, including mathematical models and deep reinforcement learning,
with mathematical model methods consisting of optimization-based and rule-based approaches.
Optimization-based ramp merging control methods mainly using optimization algorithms to determine the optimal control strategy for autonomous vehicles during high traffic loads. A key step is to create a mathematical model that accurately describes the motion state of vehicles. For example, Cao et al. proposed a method for generating cooperative merging paths for autonomous vehicles in ramp merging scene using model predictive control. The results showed that cooperative merging paths could be successfully generated in various traffic scenes without the need to readjust optimization parameters. Rios-Torres and Malikopoulos proposed an optimization framework and provided a closed-form analytical solution to enable smooth traffic flow without interruption in the merging area. Zhou et al. proposed a vehicle trajectory planning method for automated ramp merging, expressing it as two related optimal control problems. To find the best solution, the Pontryagin Maximum Principle was used to provide optimal solutions for the trajectories of vehicles merging from the ramp and those on the mainline.
Rule-based on-ramp merging methods rely on a set of pre-determined rules to address the issue. These methods are easy to implement and widely adopted, but defining the rules can be challenging. For instance, Ding proposed a rule-based cooperative merging adjustment algorithm for connected vehicles in the ramp area. The algorithm includes a central controller that assigns vehicle arrival times and determines the optimal merging sequence for mainline and ramp vehicles, reducing travel time in the merging area. In the Vehicle-to-Vehicle environment, Shi proposed a rule-based vehicle cooperative merging model to complete ramp merging. By selecting suitable gaps and establishing a linear time-discrete model, vehicle trajectories are optimized to achieve merging. Hu and Sun proposed an online system control algorithm suitable for multi-lane highway merging areas. The algorithm coordinates traffic flow within the merging area by optimizing vehicle lane-changing and following trajectories. It adjusts lane traffic upstream of the merging point using rule-based lane-changing decisions, thereby balancing the traffic distribution in the downstream lanes.
Optimization-based on-ramp merging methods aim to improve road traffic efficiency and safety by optimizing vehicle control strategies. However, it is important to note that different optimization algorithms must be customized for specific traffic environments and vehicle densities. Moreover, the generalizability of the algorithms may be limited. These rule-based models have limitations in terms of agent control due to the difficulty in defining rules and constraints on discrete actions. In contrast, learning-based methods can automatically learn the relationship between continuous control inputs and outputs, addressing the limitations of rule-based methods. Therefore, in recent years, learning-based methods have garnered increasing attention and development.
With the advancement of machine learning technologies in recent years, sophisticated machine learning algorithms have been widely applied in the development of autonomous driving vehicles. However, these methods are mostly data-driven, requiring a large amount of offline driving data (usually with labeled tags) to cover all possible scenarios. This heavy reliance on extensive training data limits the application of machine learning methods in autonomous driving, as it is challenging to collect large-scale datasets containing a variety of scene, and the labeling work is time-consuming. In contrast, deep reinforcement learning offers a solution to these limitations, as the data for DRL training can be obtained from the interaction between the agent and the environment, and the training process does not require labeled tags. This not only reduces the dependence on large-scale labeled datasets but also allows for trial-and-error learning in various driving scenarios. In the application of DRL algorithms, if an agent takes an incorrect action, it will be punished to reduce the likelihood of repeating that action in the same state. For the solution to on-ramp merging, Li et al. proposed a new safety indicator-time difference to merging, combined with the classic time to collision indicator to evaluate driving safety, assisting merging vehicles in finding suitable gaps in the traffic flow, thereby improving driving safety. Chen et al. developed an effective reinforcement learning method, integrating action masks, curriculum learning, and parameter sharing. Experimental results show that the proposed method outperforms existing methods in both training efficiency and collision rate. Zhang et al. proposed an IPPO method based on proximal policy optimization algorithm, which is based on autonomous learning and parameter sharing strategy, establishing an autonomous driving behavior decision model.
The aforementioned centralized control methods mainly consider the case where there is only one lane on the mainline, or assume that vehicles on the mainline do not change lanes. However, in real on-ramp merging traffic scenarios, the mainline usually has two or more lanes, and if the flow in other lanes is smaller, vehicles on the mainline often choose to enter the lane with less traffic in advance to avoid conflicts with ramp vehicles. Some scholars have studied the ramp merging problem in multi-lane scenarios. Hou et al. proposed a hierarchical model of collaborative on-ramp merging control for mixed traffic, where the upper layer uses a predictive position search algorithm to predict the merging location, and the lower layer uses a cooperative merging control model to ensure the safety and smooth execution of the merging vehicle. This algorithm only considers the elimination of conflicts between a single ramp vehicle and mainline vehicles in multi-lane scenarios, without considering the unbalanced lane traffic caused by a large number of ramp vehicles merging into the mainline. Most current research focuses on the assumption that fully autonomous driving vehicles will be widely adopted, which is overly idealistic. In reality, full-scale adoption may still take some time. During this transitional period, the coexistence of Connected and Autonomous Vehicles and Human-Driven Vehicles in traffic flows will become the norm, increasing the complexity of the traffic environment. However, research on this mixed traffic condition is still relatively scarce.
Therefore, this paper designs a deep reinforcement learning-based multi-lane mixed traffic strategy for on-ramp merging, with the main contributions including,
One. A novel Vehicle Coordination System is established, which is embedded in roadside units and receives real-time information through Vehicle-to-Vehicle communication. It balances the traffic flow between different lanes in real-time, optimizes the use of multiple lanes, and addresses the issue of uneven traffic flow across lanes.
Two. A novel D three Q N algorithm improved with sine functions is proposed for on-ramp merging strategies, utilizing the smoothness of sine functions for horizontal control in lane-changing strategies. Longitudinal control employs the car-following model of the Intelligent Driver Model, assisting vehicles in finding suitable gaps within the traffic flow.
Three. An action shielding mechanism is proposed to ensure that post-merge actions are safe. Since agents based on D R L learn through trial and error, they may take actions that threaten traffic safety during the strategy exploration phase. To address this issue, an action shielding mechanism is incorporated after the ramp vehicle reaches the merging point, ensuring driving safety in various on-ramp merging scenarios.
The rest of the paper is organized as follows. Section Two introduces the division of the on-ramp merging scenario area and the operating mechanism of the Vehicle Coordination System. Section Three introduces the proposed I D S Ramp Merging Strategy. Section Four analyzes and discusses the simulation results. Section Five discusses the proposed method on the basis of the results and concludes the study.