Modular Reinforcement Learning for Playing the Game of Tron

Mingi Jeon, Jay Lee, Sang Ki Ko

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Tron is a simultaneous move two-player game where a wall is created along the path where two agents move and the agent that crash with the wall first is defeated. Due to the fact that the same action may result in different outcomes (non-stationarity), it is difficult to utilize the basic approach of reinforcement learning. In this paper, we present a modular reinforcement learning (MRL) approach to tackling the game of Tron by decomposing the game into two phases where the first phase is non-stationary and the second phase is stationary. We train two separate models where the first model deals with the non-stationary environments such that two models move simultaneously and affect each other while the second model deals with the stationary environment when two agents are separated by walls created and cannot affect each other. We show that the latter model can be effectively pre-trained using randomly generated stationary environments. We evaluate the performance of our algorithm by comparing with previous algorithms including the state-of-the-art algorithm for the game of Tron (called a1k0n) in different grid sizes. As a result, we demonstrate that the proposed algorithm based on MRL outperforms all previous algorithms on 6 × 6 and 8 × 8 grids. Although our algorithm shows slightly worse performance on 10 × 10 grid than the strongest baseline a1k0n, we show that our algorithm exhibits better scalability in terms of time complexity as the grid size increases than search-based heuristics including the a1k0n.

Original languageEnglish
Pages (from-to)63394-63402
Number of pages9
JournalIEEE Access
Volume10
DOIs
StatePublished - 2022

Keywords

  • Modular learning
  • Tron
  • non-stationary environment
  • reinforcement learning

Fingerprint

Dive into the research topics of 'Modular Reinforcement Learning for Playing the Game of Tron'. Together they form a unique fingerprint.

Cite this