Detection of Outliers in Spatial-temporal Data A

Date

2011-05-13

Authors

Rogers, James P.

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Outlier detection is an important data mining task that is focused on the discovery of objects that deviate significantly when compared with a set of observations that are considered typical. Outlier detection can reveal objects that behave anomalously with respect to other observations, and these objects may highlight current or future problems. Previous outlier detection methods have focused primarily on only one non-spatial numerical attribute and have not successfully dealt with multiple dimensions. Many previous methods assume a Gaussian distribution of the data which is probably a major fallacy in determining outliers for spatial-temporal data. Most previous efforts did not provide a statistical confidence measure, but including a confidence measure should improve the detection of outliers. Outlier detection is often complicated by noise in the data, so a good outlier detection methodology should be successful in identifying outliers in noisy data. Global outlier methods calculate a single outlier statistic that summarizes the outliers for the entire geographic area and temporal duration, while local outlier methods calculate a outlier statistic for each feature based on its similarity to its neighbors. Previous methods have not been able to determine outliers as the vector of attributes, location, and time change. The objective of my research is to devise a methodology to address these problems and challenges. The objective of my research is to develop a robust method of diagnosing outliers and to extend it to detecting outliers in spatial-temporal data. A spatial-temporal outlier is an observation whose values are significantly different from those of other spatially and temporally referenced objects in its spatial-temporal neighborhood. Geographic phenomena are difficult to analyze using traditional data mining methods. Determining relationships among phenomena as they move and change over time is not possible by means of human analysis of spatial-temporal data streams. Also, the volume of spatial-temporal data being collected is increasing steadily due to the usage of cameras, sensors, and mobile devices (e.g., cell phones) and is too much data for the human to analyze. My method, unlike many detection methods found in the literature, does not require the user to enter the number of outliers to be found or the percentage of outliers to be found and does not assume any distribution of the data (e.g., Gaussian). My method only requires the input of two parameters: the statistical confidence level and the number of nearest neighbors, and only the statistical confidence level is significant. My method allows for different ways to measure the degree of non-conformity and works for high-dimensional data, noisy data, and data with or without clustering information. The basic outlier detection method was extended to spatial-temporal data by using kernels for the vector of attributes, spatial, and time that provides a capability to focus outlier detection on local neighborhoods, and the user is able to input weights for each of the kernels. Local spatial-temporal outliers are outliers determined within a specific spatial area and time frame which is a subset of the entire spatial area and total temporal duration. Empirical evaluation was conducted on several datasets with very good results achieved. The datasets increased in complexity and dimensionality. The experiments on these datasets using my method produced results with a high True Positive percentage and a low False Positive percentage.

Description

Keywords

Outliers, Transduction, Spatial, Probability, Temporal, Confidence

Citation