Proportionality Assessment in Military Operations based on SARSA and PPO Models

Authors

  • Clara Maathuis Open University

DOI:

https://doi.org/10.34190/iccws.21.1.4453

Keywords:

military operations, proportionality, artificial intelligence, reinforcement learning, SARSA, PPO

Abstract

This research proposes a novel decision‑support framework for proportionality assessment in military operations integrating on‑policy and policy‑gradient reinforcement‑learning methods to encode expert rules and automatically classify engagement scenarios. On this behalf, two modelling perspectives are considered implementing SARSA (State-Action-Reward-State-Action) and PPO (Proximal Policy Optimization) algorithms, and two approaches are adopted, i.e., in order to consider or not the integration of psychological effects or harm as part of collateral damage. Further, various optimization methods and simulation scenarios are considered in order to understand the effectiveness and robustness of the modelling techniques developed. From the results obtained, it can be seen that while SARSA achieves rapid reward stabilization but exhibits limited accuracy due to potential bootstrapping bias and insufficient exploration in larger state spaces, PPO’s clipped surrogate updates yield robust, monotonic improvement, consistently realizing high classification accuracy across both cases, albeit over longer training horizons. To this end, a comparative analysis is conducted based on simulation results, learning curves, Q‑value/policy distributions, and confusion matrices to illustrate each algorithm’s strengths and limitations. Hence, this research demonstrates the viability of reinforcement learning models as transparent, adaptable tools for proportionality assessment for supporting real‑time operational decisions

Downloads

Published

19-02-2026