A Probabilistic Methodology to Identify Top Causal Factors for High Complexity Events from Data




Bati, Firdu

Journal Title

Journal ISSN

Volume Title



Complex systems are composed of subsystems and usually involve a large number of observable and unobservable variables. They can be modeled by determining the relationship between the observable factors (features) or their abstractions and the target (class) variables in the domain. The relationship is defined by a hierarchy of factors, categories, sub-categories, and the target variables. The factors, categories, and sub- categories are groupings of all the variables, and are represented as artifacts in the model which can be generated from real-world data. In safety risk modeling based on incident/accident data, the target variables are the various events and the features represent the causal factors. The National Airspace System (NAS) is an example of a safety-critical domain. The NAS is characterized by several datasets. One of the datasets, the Air Traffic Safety Action Program (ATSAP), is designed to capture Air Traffic Control (ATC) related safety data about the NAS. The ATSAP defines ATC domain with more than 300 observable factors and 21 undesired events along with other miscellaneous variables. There are more than 70,000 ATSAP incident reports between 2008 and 2013. Developing a useful model of safety for the NAS using the ATSAP dataset is prohibitively complex due to the large number of observable factors, the complex relationships between observed variables and events, and the probabilistic nature of the events. Probabilistic Graphical Models (PGMs) provide one approach to develop practical models that can overcome these difficulties and be used for safety analysis and decision-making. This dissertation describes an approach using a class of PGM called Bayesian Networks to develop a safety model of the NAS using ATSAP data. The modeling technique includes approaches to abstract hierarchical relationships from observable variables and events in the domain data by: (1) creating categorical variables from data dictionary to abstract sub-categories and lowest level observable variables in the hierarchy, and by (2) calculating the probability distribution of the categories from their low level categories or observable variables. The models can be used to identify the top causal factors leading to undesirable events as well as determine their impact on undesired events due to changes in the likelihood of the occurrence of the causal factors. Specifically, the model can be used for: -- Identifying top causal factors according to the highest changes of probability measure as a result of partial evidences on the target events -- Measuring the significance of the relationships of aggregated top causal factors with the undesired event overtime using time series and regression analysis -- Determining correlations between top causal factors to identify sub-factors for those factors that are difficult to act upon -- Identifying top generic issues in the NAS by applying a ranking algorithm that uses the frequency of each top causal factor and the severity index of events This powerful tool can be used to supplement the existing decision-making process which relies on expert judgments that are used by governments to determining the initiatives to address safety concerns. Application of this methodology is used to identify significant causal factors and top issues in the NAS using 41,000 records of data from the ATSAP database from 2008 to 2013. As demonstrated in one of the top causal factors models in this dissertation, the probabilistic causal models can be extended into decision networks by formulating a unified utility function using control and mitigation variables and by maximizing the utility function. A decision network is used to help make optimum data-driven decisions thereby reducing the likelihood of the undesired events and their consequences. This is achieved by combining graphical, computational, and statistical software libraries and packages. The model outputs of the top causal factors for individual events show that 15 of the 21undesired events can be explained with a small number of causal factors, typically ranging from 5 to 8 sub-factors. This is in line with the yearly published report by the “Top-5 Safety Hazard” provided by the Office of Safety at the Federal Aviation Administration (FAA). The top issues identified in the analysis are ranked by weighting the relative frequency of each factor and the severity of the events the factor is involved in. The top five issues of this research’s output include individual factors (controller actions), outside influences (distractions), clearance problems, aircraft performance, and procedure deficiencies. Only procedural deficiencies (conflicting) and clearance problems (heading) were reported in the Top-5 report for the year 2013. The other issues either were not identified in the manual process or were considered non-actionable (e.g. distractions). The analysis identified actionable sub-factors contributing to the non- actionable causal factors (e.g. ambient noise is one of the highly correlated actionable sub-factors identified for distractions). The analysis also identified the relationships between top factors and relative frequencies of the undesired events and trends overtime emphasizing the need for more careful analysis within the annual time frames as well as year-over-year analysis.



Bayesian Factor Selection, Causal Bayesian Networks, Causal Factor Association, Causal Factors of events, Causes of Aviation Incidents, Probabilistic Causal Model