### Abstract

Autonomous driving in multi-agent and dynamic traffic scenarios is challenging, where the behaviors of other road agents are uncertain and hard to model explicitly, and the egovehicle should apply complicated negotiation skills with them to achieve both safe and efficient driving in various settings, such as giving way, merging and taking turns. Traditional planning methods are largely rule-based and scale poorly in these complex dynamic scenarios, often leading to reactive or even overly conservative behaviors. Therefore, they require tedious human efforts to maintain workable. Recently, deep learning-based methods have shown promising results with better generalization capability but less hand engineering effort. However, they are either implemented with supervised imitation learning (IL) that suffers from the dataset bias and distribution mismatch problems, or trained with deep reinforcement learning (DRL) but focus on one specific traffic scenario. In this work, we propose DQ-GAT to achieve scalable and proactive autonomous driving, where graph attention-based networks are used to implicitly model interactions, and asynchronous deep Q-learning is employed to train the network end-to-end in an unsupervised manner. Extensive experiments through a high-fidelity driving simulation show that our method can better trade-off safety and efficiency in both seen and unseen scenarios, achieving higher goal success rates than the baselines (at most 4.7×) with comparable task completion time.

### Comparative Results

$$\uparrow$$ means larger numbers are better. $$\downarrow$$ means smaller numbers are better.
The bold font highlights the best results in each column.