Handling Missing Data in Randomization-based Inference




Tan, Xiao

Journal Title

Journal ISSN

Volume Title



Randomized controlled trials (RCTs) serve as the gold standard in researching and developing new therapeutics. A new treatment’s effectiveness is evaluated by comparing it to existing or standard treatment in an RCT. However, the imbalance in participants’ characteristics between groups would harm such comparison. The act of randomization on patients mitigates the bias caused by such imbalance in the evaluation of treatment effects. The randomization-based inference was first introduced by Sir R.A. Fisher as an approach to evaluate treatment effects in an RCT. The limit in computing power has slowed its development in the past. However, the tremendous growth of computing technology enables us to compute randomization tests easily. Randomization-based inference is a natural way to analyze data from a clinical trial. But the presence of missing outcome data is problematic: if the data are removed, the randomization distribution is destroyed, and randomization tests have no validity. There are no randomization-based methods to handle missing data. In this thesis, the unconditional reference set method, the conditional reference set method, and the randomization-based multiple imputation are described to handle missingness while preserving the randomization distribution. Randomization-based missing data methods are compared to population-based and parametric imputation approaches via the metrics of type I error rates and power under both homogeneous and heterogeneous population models. Randomization-based analogs of standard missing data mechanisms are described, and a randomization-based procedure is proposed to determine if data are missing completely at random. A large simulation protocol is implemented to conclude that the unconditional, the conditional reference sets method and the randomization-based multiple imputation are reasonable approaches to handle missing data in patients’ missingness in the context of a two-armed RCT.