Robust Reinforcement Learning with Diffusion Wavelets



Journal Title

Journal ISSN

Volume Title



Reinforcement Learning is a method of learning from the environment by constantly observing it and evaluating its response to a set of actions. Long-term learning of a dynamic system for aforementioned interactions where its constituents evolve temporally, accumulates valuable knowledge about system’s history and actions taken. Reinforcement Learning (RL) takes advantage of this gathered knowledge to identify an optimal policy that essentially dictates which decision(s) should be made when the system is in a certain state in order to guide it to achieve the best objective value in the long run. Since system’s objective value is nothing but accumulated discounted rewards over various system states, it is crucial to define and retrieve rewards from the environment as genuinely as possible. However, there are several instances where we might be receiving noisy, corrupt, or even intentionally perturbed state perceptions that contain distorted rewards. In robotics for example, it is quite common to receive noisy feedback through faulty sensors or adverse environment because of rain, wind, lack of light, etc. The focus of this research is to study and utilize methods to properly handle intentionally corrupted discrete-state perceptions and their associated reward channel that are crafted in a way which will increase the likelihood of the model taking actions not in line with its genuine objectives but of an unknown adversary’s. This may seriously compromise RL model’s functionality and addressing it, is of utmost importance in Secure Machine Learning domain as it has been the case with Artificial Neural Network models targeted by the “Adversarial Attacks”. A value-based RL model needs to assess all possible decisions based on the quality of the upcoming states and these quality values (known as q-values) should be obtained from a robust process. While many state-of-the-art RL models use Artificial Neural Networks for providing state values to the RL agent, in this research it will be shown that another choice of value function approximator namely the Diffusion Wavelet method, a potent denoising tool within the Digital Signal Processing domain, provides state value approximations that are arguably more robust to adversarial perturbations in discrete environments. The contributions of this dissertation are: (1) Evaluated the robustness of RL models that utilize Diffusion Wavelets (DWs) as their Value Function Approximation (VFA) module by comparing their robustness to RL models with Deep Neural Networks (DNNs) as their VFA. The experiments in this research showed 145% improvement of robustness in Testbed-1 (Discrete benchmark) while DW-VFA was used compared to a DNN-VFA and a modest robustness improvement of 5.1% in Testbed-2 (Continuous benchmark) for the same comparison (DW vs. DNN). (2) Measured computational cost of DW versus DNN value function approximator for RL models in chosen Testbeds and verified that whether the gain in robustness will lead to higher cost. Experiments showed that in Testbed-2 which is a continuous environment, DW comes with higher computational cost of learning while the gain in robustness was not significant. Experiments also showed that while the discrete benchmark (Testbed-1) demonstrated 80.7% decrease in computational cost of DW compared to DNN, for the continuous benchmark (Testbed_2), DW-VFA resulted in 16.8% increase in time per iteration (computational cost metric) and 30.9% increase in average runtime. This means that in Testbed-2 which is a continuous environment, DW comes with higher computational cost of learning.



Approximate Dynamic Programming, Diffusion Wavelets, Operations Research, Reinforcement Learning, Robust Learning