Optimization of Fluid Solvers with Respect to Fault Tolerance and Memory Latency



Journal Title

Journal ISSN

Volume Title



Constant advancement of computational systems lifts the theoretical boundaries of what is possible to achieve with numerical simulations. In order to fully utilize the capabilities of advanced computational resources, codes must be adapted accordingly. One major challenge that comes with petascale and exascale computing is fault tolerance. The larger the number of nodes used for code execution the lower the expected time between hardware failures. Based on available research data, several failures per day can occur when running massively parallel applications. Several fault tolerance enabling techniques have been analyzed and proposed in past years; however, currently there are no fault tolerant computational fluid dynamics (CFD) solvers that can efficiently execute an application at the Exascale or Petascale level. The aim of this PhD dissertation is to analyze and implement available resilience techniques to develop a fault-tolerant CFD solver. The second challenge addressed in this work is the memory latency problem for CFD codes. Many CFD codes that exhibit low computational intensity (flops per RAM access) ‘saturate’ the memory bandwidth of modern chips after only a few cores; therefore, any possible benefits of utilizing more of the available cores are minimized. While previously the CPU speed determined how fast a certain code could be executed, currently, the memory access speed sets the upper limit for the solver’s performance. That is the reason why some fluid solvers can achieve only 10-15 percent of the peak performance of the floating point pipelines on recent CPU cores. This has led to the development of minimal memory access loop (MMAL) options for finite difference solvers. Several loops are described and analyzed. Finally another approach to address the memory latency problem for CFD codes is investigated. Intrinsic instructions in C++ are used to code the subroutine that obtains the right hand side (RHS) for the finite difference approximation. Intrinsic instructions take advantage of the full vector length and maximize the number of operations that can be done simultaneously.