An Approximate Dynamic Programming Approach to Future Navy Fleet Investment Assessments

Date

2022

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Navy decision makers and program sponsors must decide on future investments in the context of forecasted threats and operating environments. Investment assessments are difficult in that forecasted costs and utilities are oftentimes based on non-existent resources and technology. Forecasted projection model vectors are informed by current data that reflect similar or “close as possible” technologies, and are limited to scenario scope. That is, the common assessment modeling method of placing representative agents in a scenario-based simulation to assess future investment utilities are limited by scenario and design capabilities.The research objective is to combat the limitations of specific scenario-based analyses by modeling the operational lifespan of future Navy Destroyer (DDG) fleet configurations as Markov decision processes (MDPs) evaluated with dynamic programming (DP) value iteration to calculate the maximum DDG-configuration utility. xii All MDP parameters are informed by existing models and subject matter expert (SME) inputs. The transition probability matrices (TPMs) assess the probabilities that a DDG transitions between states as a chance function of future configuration capabilities and sequential actions that are more representative of the operational lifetime of a configured DDG than that of a single scenario. Likert type values are assigned to each pairwise decision-state so that Bellman’s optimality equation solves for maximum expected value via non-discounted value iteration. These maximum expected values become the decision variable coefficients of an integer programming configuration-destroyer assignment model that maximizes the sum of destroyer-configuration values according to budgetary, logistic, and requirement-based constraints. DP value iteration is appropriate for this problem in that the algorithm does not require a time-value discount parameter and the objective is the maximum expected value, and I compare DP results to the approximate dynamic programming (ADP) method of Q-learning. Modeling with ADP removes the need for TPMs for large problem instances, thereby providing a framework for near-optimal decisions, and this research highlights the similarities in the solution between ADP and DP. ADP results align with DP results because the accurate ADP parameter settings enable learning and exploration that guarantees near-optimal ADP solutions, thereby opening the door for computationally scalable algorithms. This work contributes to SME and DM insight, mitigating bias towards technologically superior configurations by revealing utility values that make the seemingly less capable configurations more competitive in terms of long-term value. This insight is due to DP optimal policies and ADP near-optimal policies that are driven xiii by values of the states, an insight that would not have been possible without this research. This study demonstrates that the less advanced technologies can be deployed in such a way to maximize their long-term utility so that they are more valuable than expected in future operational environments. OPNAV desire for modeling methods that complement existing campaign models 0is evidenced by this method’s briefing to incoming OPNAV analysts as a “best practice” for evaluating complex decisions. This study’s contribution to the high-visibility R3B study received high-level recognition in a Presidential Meritorious Service Medal citation. This research contributes AI-enabled decision-making in in a culture that relies on familiar, anecdotal, or experience-based approaches. AI-enabled decision-making is necessary to compete with near-peer adversaries in dynamic decision-making environments.

Description

Keywords

Approximate Dynamic Programming, Artificial intelligence, Correspondence analysis, Decision making, Optimization

Citation