Researchers at the University of South Florida (USF), who are working with EMOL Health, submitted a paper which is under review by IEEE Transactions on Biomedical Engineering. This paper outlines a process for extracting information from EMRs to develop prognostic models by focusing on palliative chemotherapy for stage IV breast cancer patients.

Abstract: The purpose of this work is to develop a model to predict a stage IV breast cancer patient’s response to first-line chemotherapy using data mining techniques. We discuss the process of extracting and processing electronic medical record (EMR) data from a private oncology practice and the method for developing a logistic regression model based on commonly collected laboratory data. There were approximately 1200 patients from a large medical oncology practice in the mid-west initially identified for participation in our study. A k-fold cross validation was utilized to train and test the model with accuracy, specificity, and sensitivity being used for evaluation purposes. Three consensus models (CM1-3) were constructed with accuracy being the primary mea- sure of model performance. The accuracies, as a percent, for the three models were 71.96 ± 0.22, 71.87 ± 0.22, and 71.04±0.15. The difference in accuracies was found to be significantly different for each pair of consensus models (CM1 vs CM2, p = 0.02; CM1 vs CM3, p < 0.001; CM2 vs CM3, p < 0.001).