Fine-scaled IoT Temperature Filling and Urban Heat Predictions with Deep Learning



Journal Title

Journal ISSN

Volume Title



Rising temperature is a major concern of urban livelihood and has become more severe with rapid urbanization. The complexity of built-up urban fabrics and the unevenly distributed anthropogenic heat release have led to urban heat variation. In response to the increasing greenhouse effect in recent years, the demand for understanding the heat variation in the U.S. has risen dramatically. The global warming trend deteriorates the variation by increasing the already high temperatures in heated areas. Many concerns have been brought up related to urban heat variability, primarily in energy and health fields. To address these concerns, many studies have been conducted for urban temperature observations and predictions. Missing data in observation is inevitable, which makes continuous high-resolution measurements challenging to acquire. Different discriminative and generative models established for sensor missing data filling often show their limitations (e.g., accuracy, stability, efficiency) when fitting into different datasets. Existing research methods for temperature prediction are mainly divided into deterministic methods and statistical methods. Deterministic methods require very informative observations that are difficult to obtain in practice. In addition, various types of parameters need to be determined, but since these parameters are usually estimated based on experience, the accuracy is limited. Statistical methods, on the other hand, often fail to effectively integrate and analyze multi-source heterogeneous data, which has a considerable impact on temperature. The machine learning (ML) and deep learning (DL) methods proposed in recent years can learn to effectively present features from a large amount of input data. However, to carry out full-coverage high-resolution forecasts, there are high demands to integrate surface weather data and air temperature observations. Data scarcity also brought limitations to many current well-performed ML/DL methods. Another challenge expected to be solved is to transfer and reapply patterns learned in one city to another, as models do not naturally perform well across different regions. Regarding the missing data challenge, different algorithms (i.e., Kriging, MissForest, GAIN) were selected for comparison. All models built upon these algorithms are tested to fill the missing data at the rate of less than 10%, 20%, 40%, 60%, and 80%. Testing data are selected using either different seasons, or randomly draws from the entire dataset, to measure the stability of these models. Experiments were conducted to shows their performance in data filling accuracy and consistency across different missing data settings. Computational efficiency was considered to provide a complete dataset in real-time. Results demonstrated that each model has its strength and limitations. Ensemble models should be expected to integrate their respective superiorities in computational speed, imputation accuracy, and adaptability to different data missing situations. Regarding fine-scaled temperature prediction and data scarcity, a framework was proposed to: 1) provide a fast data fusion technique, integrating measurements from the Internet of Things (IoT) of a high spatiotemporal resolution with observations from weather stations; 2) utilize a Long Short-Term Memory network to predict surface temperature from the fusion dataset for four major cities in the U.S.; 3) adopt transfer learning, leveraging the pre-trained model from regions with a higher number of observation stations to predict regions with data scarcity. With the proposed framework, multi-step predictions with low RMSEs were achieved. The transferable model also greatly improved the prediction accuracy for regions with data scarcity up to 26%.This dissertation makes an innovative contribution for the following reasons: 1) The comparison of data filling methods suggests an optimal way to complete hourly IoT temperature measurements in Los Angeles by testing different angles (i.e., computational speed, imputation accuracy, and adaptability to different data missing situations). 2) The DL-based prediction framework provides high-resolution results with up to a 39.6% MAE decrease. It supports data for near future heat-related decision-making in study areas including Los Angeles, New York City, and Atlanta. 3) The transfer learning utilizes well-established models trained by the DL-based prediction framework to minimize the prediction error for regions with data scarcity problems. It improves the predicting MAE up to 25.7%.