Mason Archival Repository Service

Intelligent Patient Data Generator

Show simple item record

dc.contributor.advisor Wojtusiak, Janusz
dc.contributor.author Zare, Mojtaba
dc.creator Zare, Mojtaba
dc.date 2020-08-10
dc.date.accessioned 2020-10-13T17:25:32Z
dc.date.available 2020-10-13T17:25:32Z
dc.identifier.uri http://hdl.handle.net/1920/11868
dc.description.abstract Patient data are regarded as highly sensitive and protected by federal, state and local policies that make it available to only those who have been given access to protected health information. Synthetic data generation provides one possible solution to the issue of limited access, but at the same time, it is a key challenge in big data benchmarking that aims to generate application-specific datasets. In this dissertation, first, a comprehensive literature on synthetic data generation is presented which helps readers and practitioners in effectively adopting data generator approaches and provides an insight into its state-of-theart. Next, a Machine Learning (ML)-based algorithm, Intelligent Patient Data Generator (IntPDG), is proposed to generate scalable patient claims data. In order to construct a model for generating high quality of patient data, two main elements including back window size and hyperparameters of different ML algorithms are investigated. Besides, a data evaluation measure, Weighted Itemset Error (WIE), is presented and used to evaluate the quality of the generated data in hyperparameter optimization. To generate claim level data from patient level data, patterns and data structures of actual patient claims data are xiii gathered and used in probabilistic models. Once the data generator method is constructed, it is tested on simulating Medicare carrier claims data, consisting of three datasets: patient demographic table, patient claim table, and patient line table. To add another layer of validation to the synthetic data, summary statistics of the generated datasets are compared with that of Medicare data and result confirms the consistency and validity of the simulated claims data. The developed data generator method can be used to generate any sizes and any types of claims data such as inpatient and outpatient claims data or can be extended to generate other medical data such as Electronic Health Records (EHR). en_US
dc.language.iso en en_US
dc.subject patient data en_US
dc.subject claims data en_US
dc.subject simulation en_US
dc.subject data evaluation en_US
dc.subject Machine Learning en_US
dc.title Intelligent Patient Data Generator en_US
dc.type Dissertation en_US
thesis.degree.name Doctor of Philosophy in Health Services Research en_US
thesis.degree.level Doctoral en_US
thesis.degree.discipline Health Services Research en_US
thesis.degree.grantor George Mason University en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search MARS


Browse

My Account

Statistics