INTELLIGENT PATIENT DATA GENERATOR

Mojtaba Zare

INTELLIGENT PATIENT DATA GENERATOR

Files

Zare_gmu_0883E_12391.pdf (3.3 MB)

Date

2020

Authors

Mojtaba Zare

Abstract

Patient data are regarded as highly sensitive and protected by federal, state and local policies that make it available to only those who have been given access to protected health information. Synthetic data generation provides one possible solution to the issue of limited access, but at the same time, it is a key challenge in big data benchmarking that aims to generate application-specific datasets. In this dissertation, first, a comprehensive literature on synthetic data generation is presented which helps readers and practitioners in effectively adopting data generator approaches and provides an insight into its state-of-the-art. Next, a Machine Learning (ML)-based algorithm, Intelligent Patient Data Generator (IntPDG), is proposed to generate scalable patient claims data. In order to construct a model for generating high quality of patient data, two main elements including back window size and hyperparameters of different ML algorithms are investigated. Besides, a data evaluation measure, Weighted Itemset Error (WIE), is presented and used to evaluate the quality of the generated data in hyperparameter optimization. To generate claim level data from patient level data, patterns and data structures of actual patient claims data are gathered and used in probabilistic models. Once the data generator method is constructed, it is tested on simulating Medicare carrier claims data, consisting of three datasets: patient demographic table, patient claim table, and patient line table. To add another layer of validation to the synthetic data, summary statistics of the generated datasets are compared with that of Medicare data and result confirms the consistency and validity of the simulated claims data. The developed data generator method can be used to generate any sizes and any types of claims data such as inpatient and outpatient claims data or can be extended to generate other medical data such as Electronic Health Records (EHR).

URI

https://hdl.handle.net/1920/12499

Collections

College of Public Health

Full item page

INTELLIGENT PATIENT DATA GENERATOR

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections