Research published in Nature reports the victory of an agent trained with a combination of deep reinforcement learning and specific scenario training against champion-level drivers at the video game Gran Turismo
, Areas of the track where Igor lost time with respect to GT Sophy. Corner 20, highlighted in yellow, shows an interesting effect common to the other corners in that Igor seems to catch up a little by braking later, but then loses time because he has to brake longer and comes out of the corner slower. Igor’s steering controls and Igor’s throttle and braking compared with GT Sophy on corner 20.
They were first filtered in the experiment to select the subset on the Pareto frontier of a simple evaluation criteria trading off lap time versus off-course and collision metrics. The selected policies were run through a series of tests evaluating their overall racing performance against a common set of opponents and their performance on a variety of hand-crafted skill tests. The results were ranked and human judgement was applied to select a small number of candidate policies.