Intelligent Patient Data Generator

dc.contributor.advisorWojtusiak, Janusz
dc.contributor.authorZare, Mojtaba
dc.creatorZare, Mojtaba
dc.date2020-08-10
dc.date.accessioned2020-10-13T17:25:32Z
dc.date.available2020-10-13T17:25:32Z
dc.description.abstractPatient data are regarded as highly sensitive and protected by federal, state and local policies that make it available to only those who have been given access to protected health information. Synthetic data generation provides one possible solution to the issue of limited access, but at the same time, it is a key challenge in big data benchmarking that aims to generate application-specific datasets. In this dissertation, first, a comprehensive literature on synthetic data generation is presented which helps readers and practitioners in effectively adopting data generator approaches and provides an insight into its state-of-theart. Next, a Machine Learning (ML)-based algorithm, Intelligent Patient Data Generator (IntPDG), is proposed to generate scalable patient claims data. In order to construct a model for generating high quality of patient data, two main elements including back window size and hyperparameters of different ML algorithms are investigated. Besides, a data evaluation measure, Weighted Itemset Error (WIE), is presented and used to evaluate the quality of the generated data in hyperparameter optimization. To generate claim level data from patient level data, patterns and data structures of actual patient claims data are xiii gathered and used in probabilistic models. Once the data generator method is constructed, it is tested on simulating Medicare carrier claims data, consisting of three datasets: patient demographic table, patient claim table, and patient line table. To add another layer of validation to the synthetic data, summary statistics of the generated datasets are compared with that of Medicare data and result confirms the consistency and validity of the simulated claims data. The developed data generator method can be used to generate any sizes and any types of claims data such as inpatient and outpatient claims data or can be extended to generate other medical data such as Electronic Health Records (EHR).
dc.identifier.urihttps://hdl.handle.net/1920/11868
dc.language.isoen
dc.subjectPatient data
dc.subjectClaims data
dc.subjectSimulation
dc.subjectData evaluation
dc.subjectMachine learning
dc.titleIntelligent Patient Data Generator
dc.typeDissertation
thesis.degree.disciplineHealth Services Research
thesis.degree.grantorGeorge Mason University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy in Health Services Research

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zare_dissertationV2_2020.pdf
Size:
3.3 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.52 KB
Format:
Item-specific license agreed upon to submission
Description: