Autonomous driving is a multi-agent setting where the host vehicle must apply
sophisticated negotiation skills with other road users when overtaking, giving
way, merging, taking left and right turns and while pushing ahead in
unstructured urban roadways. Since there are many possible scenarios, manually
tackling all possible cases will likely yield a too simplistic policy.
Moreover, one must balance between unexpected behavior of other
drivers/pedestrians and at the same time not to be too defensive so that normal
traffic flow is maintained.
In this paper we apply deep reinforcement learning to the problem of forming
long term driving strategies. We note that there are two major challenges that
make autonomous driving different from other robotic tasks. First, is the
necessity for ensuring functional safety - something that machine learning has
difficulty with given that performance is optimized at the level of an
expectation over many instances. Second, the Markov Decision Process model
often used in robotics is problematic in our case because of unpredictable
behavior of other agents in this multi-agent scenario. We make three
contributions in our work. First, we show how policy gradient iterations can be
used without Markovian assumptions. Second, we decompose the problem into a
composition of a Policy for Desires (which is to be learned) and trajectory
planning with hard constraints (which is not learned). The goal of Desires is
to enable comfort of driving, while hard constraints guarantees the safety of
driving. Third, we introduce a hierarchical temporal abstraction we call an
"Option Graph" with a gating mechanism that significantly reduces the effective
horizon and thereby reducing the variance of the gradient estimation even
further.