A Multidimensional Array Database Engine for Gridded Climate Data and a Precipitation Downscaling Study



Journal Title

Journal ISSN

Volume Title



Global Climate Models (GCMs) are essential tools to simulate future climate indicators and are widely used in global climate change studies. The outputs of GCMs are one of the largest sources of multidimensional gridded climate data. Data repositories are pervasive components for storing such a large amount of climate data in a big data fashion for climate studies. However, efficiently managing and querying multidimensional gridded climate data are still beyond the capabilities of most databases. The mismatch between the array data model and relational data model limited the performance to query multidimensional data from a traditional database when data volume hits a cap. Even a trivial data retrieval on large amount of multidimensional datasets in a relational database is time-consuming and requires enormous storage space. Given the scientific interests and application demands on time-sensitive spatiotemporal data query and analysis, there is an urgent need for efficient data storage and fast data retrieval solutions on large multidimensional climate datasets. To address this challenge, I introduce a method for multidimensional data storing and accessing, which includes a new hash function algorithm, a unified data storage structure, and memory-mapping technology. A prototype database engine, LotDB, was developed as an implementation of the method, which shows promising results on multidimensional gridded climate data queries compared with SciDB, MongoDB, and PostgreSQL. Meanwhile, climate and weather indicators such as precipitation derived from GCMs and satellite observations are important for the global and local hydrological assessment. However, most popular precipitation products (with spatial resolutions greater than 10km) are too coarse for local impact studies and require “downscaling” to obtain higher resolution. Traditional precipitation downscaling methods such as statistical and dynamic downscaling require an input of additional meteorological variables and very few are applicable for downscaling hourly precipitation for higher spatial resolution. To address this challenge, I utilized dynamic dictionary learning to propose a new downscaling method, PreciPatch, by producing spatially distributed higher resolution precipitation fields with only precipitation input from GCMs at hourly temporal resolution and large geographical scope. The second part of my dissertation details downscaling case studies conducted to evaluate the performance of the proposed downscaling method (PreciPatch) with bicubic interpolation, RainFARM – a stochastic downscaling method, and DeepSD – a super-resolution convolutional neural network (SRCNN) based downscaling method. PreciPatch demonstrates better performance than other methods for simulating short-duration precipitation events in both aggregated IMERG data downscaling study case and MERRA-2 precipitation downscaling study case.