EFFICIENT AND EFFECTIVE MINING OF TIME SERIES

Date

2020

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The mining of time series data has attracted much attention in the past two decades due to the ubiquity of time series in our daily lives. With the advance of technologies, the volume of available time series data becomes huge and the content is changing rapidly. This requires time series data mining methods to have low computational complexities. However, there is little work focuses on providing linear time complexity solutions for the time series data mining tasks. In this dissertation, we present three linear time methods for the classification, motif discovery, and clustering of time series respectively.Time series classification can be formulated as: given a set of labeled time series, predict the class labels of previously unknown instances. The goal of motif discovery is to find the most similar non-overlapping pair of subsequences in a given time series. Time series clustering can be formulated as to place the given unlabeled time series instances into homogeneous and separated groups. For time series classification, we propose to transform the subsequences in time series into symbolic patterns, extract symbolic pattern counts in time series as feature values by an incremental process, and use these features for efficient classification. For motif discovery, we design a special data structure called Grids. The subsequences of the time series are used to update the Grids in a random order to discover the motif. We prove that this process can guarantee to find the optimal solution and the whole process takes linear time. For time series clustering, we propose to partitions the data by checking some randomly selected symbolic patterns in the time series. Theoretical analysis is provided to show that group structures in the data can be revealed from this process. Besides these linear time complexity methods, this dissertation also presents a non-linear time method for time series classification that uses different idea from the state-of-the-arts. The idea behind the state-of-the-art time series classification techniques is to identify and extract patterns or characteristics from the labeled time series and use these patterns to classify new unlabeled data. We find that sequences of values that are very different from the patterns in the labeled time series can be used as references to classify time series effectively. We propose an evolution process to generate these sequences of values, which we call separating references, from the training data. The proposed method is robust to over-fitting and is especially suitable for the situation where little labeled data is available. We evaluate the proposed methods on time series data mining extensively with various datasets, and compare with the state-of-the-art approaches with statistical analysis. The results demonstrate the effectiveness and efficiency of the proposed methods. The research in the dissertation shows that we can have linear time complexity methods for the time series data mining tasks under study with effectiveness comparable or superior to the state-of-the-arts. These methods can facilitate the handling of large size data and fast processing of time series in a wide variety of disciplines.

Description

Keywords

Citation