A Study of Administrative Data Representation for Machine Learning

dc.contributor.advisorWojtusiak, Janusz
dc.contributor.authorAsadzadehzanjani, Negin
dc.creatorAsadzadehzanjani, Negin
dc.date.accessioned2022-08-03T20:18:38Z
dc.date.available2022-08-03T20:18:38Z
dc.date.issued2022
dc.description.abstractAdministrative data, including medical claims, are frequently used to train machine learning-based models used for predicting patient outcomes. Despite many efforts in using administrative codes (medical codes) in claims data, little systematic work has been done in understanding how the codes in such data should be represented before model construction. Traditionally, the presence/absence of these codes representing diagnoses or procedures (Binary Representation) over a fixed period (typically one year) is used. More recently, some studies included temporal information into data representation, such as counting, calculating time from diagnosis, and using multiple time windows. However, these methods were not able to comprehensively capture temporal information in data and much of temporal information such as the exact time of the occurrence of an event, and the exact sequence of an event are missed. This dissertation presents the results of development and investigation of two additional methods of administrative data representation (Temporal Min-Max and Trajectory Representation) specific to diagnoses extracted from claims data before applying machine learning algorithms. It then presents a large-scale experimental evaluation of these methods by comparing them with traditional Binary Representation using four classification problems: one-year mortality prediction and high utilization of medical services prediction, prediction of chronic kidney disease and prediction of congestive heart failure. It was shown that the optimal way of representing the data is problem-dependent, thus optimization of representation parameters is required as part of the modeling.
dc.format.extent188 pages
dc.identifier.urihttps://hdl.handle.net/1920/12961
dc.language.isoen
dc.rightsCopyright 2022 Negin Asadzadehzanjani
dc.subjectPublic health
dc.subjectArtificial intelligence
dc.subjectHealth sciences
dc.subjectData Preprocessing
dc.subjectHealth Informatics
dc.subjectMedical Claims
dc.subjectSupervised Learning
dc.subjectTemporal Machine Learning
dc.titleA Study of Administrative Data Representation for Machine Learning
dc.typeDissertation
thesis.degree.disciplineHealth Services Research
thesis.degree.grantorGeorge Mason University
thesis.degree.levelPh.D.
thesis.degree.namePh.D. in Health Services Research

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Asadzadehzanjani_gmu_0883E_12758.pdf
Size:
4.07 MB
Format:
Adobe Portable Document Format