Coverage Path Planning for Maritime Search and Rescue Using Maskable Proximal Policy Optimization

Research output: Contribution to journalArticlepeer-review

Abstract

Maritime search and rescue (SAR) missions require rapid and resource-efficient exploration of wide search areas under severe uncertainty in target location, while actual operational time is highly constrained. To support effective decision-making in such conditions, search planning is commonly formulated as a grid-based coverage path planning problem. However, traditional heuristic-based search patterns used in practice, such as parallel track search and expanding square search, offer limited adaptability because they do not explicitly reflect probability distributions that evolve due to environmental factors such as wind and ocean currents. To address these limitations, this study proposes a maskable Proximal Policy Optimization (PPO) framework for efficient grid-based search, motivated by the operational characteristics of maritime SAR missions. The proposed approach integrates domain-specific action masking rules into the policy optimization process to restrict invalid or inefficient actions during learning and execution. By guiding the agent toward feasible movements and high-priority regions of the search space, the framework promotes structured exploration behavior and stable policy learning in large-scale grid environments with sparse and unevenly distributed targets. The learning environment is constructed to reflect realistic maritime search conditions by incorporating probability distributions derived from drift particle simulations, and the search problem is modeled as a Markov decision process. The proposed method is evaluated under multiple target distribution scenarios and compared with representative heuristic search strategies, including parallel track search, nearest neighbor, expanding square search, and 2-optimization. Search performance is primarily assessed in terms of total search cost, measured by the cumulative movement steps required to complete the search task. Experimental results demonstrate that the maskable PPO consistently achieves the lowest total search cost across all tested environments. In particular, it requires substantially fewer movement steps than parallel track search and nearest neighbor methods, while maintaining comparable or superior efficiency relative to expanding square search and 2-optimization, depending on the target distribution. These results indicate that the proposed approach effectively learns coverage paths that reflect the underlying probability distribution of the environment. Overall, this study demonstrates that action-masking-based reinforcement learning provides a practical and scalable alternative to conventional heuristic search strategies for maritime SAR missions and establishes a foundation for adaptive search planning under realistic environmental uncertainty.

Original languageEnglish
Pages (from-to)1161-1178
Number of pages18
JournalKorean Journal of Remote Sensing
Volume41
Issue number6
DOIs
StatePublished - Dec 2025

Keywords

  • Coverage path planning
  • Leeway
  • Maskable proximal policy optimization
  • Search and rescue

Fingerprint

Dive into the research topics of 'Coverage Path Planning for Maritime Search and Rescue Using Maskable Proximal Policy Optimization'. Together they form a unique fingerprint.

Cite this