Discretization and Learning of Bayesian Networks using Stochastic Search, with Application to Base Realignment and Closure (BRAC)




Hoyt, Pamela J.

Journal Title

Journal ISSN

Volume Title



The need for automated Bayesian Network (BN) construction from data has increased for a variety of reasons. Elicitation of networks from experts can be time consuming and expensive. With large, complex problems, construction of an accurate network, which ‘best’ describe the probability distribution of the training data can be difficult. The learning process is further complicated by missing or incomplete data, and mixed data. In light of these issues BN construction cannot rely on experts alone, rather experts can be used to enhance the network after the automated process. The closer technology comes to building models that reflect the real world the more their power as an inference tool will increase. This research is an empirical approach on determining how well a stochastic search discretizes continuous variables and learns structure. This study also has the added complexity of missing data. Our approach interleaves discretization with a stochastic search process to find a population of solutions for learning BNs. We compared our process to other methods that discretize as a preprocessing step. Learning BN structure as well as the parameters increases in difficulty when the variables are continuous or mixed (both continuous and discrete). Real world datasets are not generally discrete and missing data is common. Continuous variables are often discretized in order to use one of the established and well-known learning algorithms. Therefore, to handle the continuous variable the two common approaches are to apply a discretization method or use one of the families of parametric distributions. The novel approach we developed is a dynamic process that interleaves partitioning of continuous variables with learning using a stochastic search method called PopMCMC. We applied our new methodology to data from the U.S. Army’s recent Base Realignment and Closure (BRAC) study from 1988 to 1995, which consists primarily of both continuous and discrete variables complicated by missing data. The desired outcome was to develop a method to model the BRAC data as a BN. A BN offered a natural way to represent the uncertainties associated with the data and would then permit us to make queries relevant to the BRAC decision making issues.



Bayesian Networks, Discretization, Continuous Variables, Stochastic Search