用户名: 密码: 验证码:
A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study
详细信息    查看全文
  • 作者:Samir E AbdelRahman (1) (2)
    Mingyuan Zhang (1)
    Bruce E Bray (3)
    Kensaku Kawamoto (1)

    1. Department of Biomedical Informatics
    ; University of Utah ; 615 Arapeen Way ; Suite 208 ; Salt Lake City ; UT ; 84092 ; USA
    2. Computer Science Department
    ; Faculty of Computers and Information ; Cairo University ; Cairo ; Egypt
    3. Departments of Biomedical Informatics and Internal Medicine
    ; University of Utah ; Salt Lake City ; UT ; 84092 ; USA
  • 关键词:Predictive analytics ; Congestive heart failure readmission ; Voting classifiers ; Feature selection ; Discretization method ; Feature ranking strategies
  • 刊名:BMC Medical Informatics and Decision Making
  • 出版年:2014
  • 出版时间:December 2014
  • 年:2014
  • 卷:14
  • 期:1
  • 全文大小:1,580 KB
  • 参考文献:1. Readmissions Reduction Program. In http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/Readmissions-Reduction-Program.html
    2. Jencks, SF, Williams, MV, Coleman, EA (2009) Rehospitalizations among patients in the Medicare fee-for-service program. N Engl J Med 360: pp. 1418-1428 CrossRef
    3. Allaudeen, N, Schnipper, JL, Orav, EJ, Wachter, RM, Vidyarthi, AR (2011) Inability of providers to predict unplanned readmissions. J Gen Intern Med 26: pp. 771-776 CrossRef
    4. Allaudeen, N, Vidyarthi, A, Maselli, J, Auerbach, A (2011) Redefining readmission risk factors for general medicine patients. J Hosp Med 6: pp. 54-60 2/jhm.805" target="_blank" title="It opens in new window">CrossRef
    5. Amalakuhan, B, Kiljanek, L, Parvathaneni, A, Hester, M, Cheriyath, P, Fischman, D (2012) A prediction model for COPD readmissions: catching up, catching our breath, and improving a national problem. Journal of Community Hospital Internal Medicine, Perspectives
    6. Garcia-Perez, L, Linertova, R, Lorenzo-Riera, A, Vazquez-Diaz, JR, Duque-Gonzalez, B, Sarria-Santamera, A (2011) Risk factors for hospital readmissions in elderly patients: a systematic review. QJM 104: pp. 639-651 CrossRef
    7. Halfon, P, Eggli, Y, van Melle, G, Chevalier, J, Wasserfallen, JB, Burnand, B (2002) Measuring potentially avoidable hospital readmissions. J Clin Epidemiol 55: pp. 573-587 21-2" target="_blank" title="It opens in new window">CrossRef
    8. Hasan, O, Meltzer, DO, Shaykevich, SA, Bell, CM, Kaboli, PJ, Auerbach, AD, Wetterneck, TB, Arora, VM, Zhang, J, Schnipper, JL (2010) Hospital readmission in general medicine patients: a prediction model. J Gen Intern Med 25: pp. 211-219 CrossRef
    9. Howell, S, Coory, M, Martin, J, Duckett, S (2009) Using routine inpatient data to identify patients at risk of hospital readmission. BMC Health Serv Res 9: pp. 96 2-6963-9-96" target="_blank" title="It opens in new window">CrossRef
    10. Kansagara, D, Englander, H, Salanitro, A, Kagen, D, Theobald, C, Freeman, M, Kripalani, S (2011) Risk prediction models for hospital readmission: a systematic review. JAMA 306: pp. 1688-1698 2011.1515" target="_blank" title="It opens in new window">CrossRef
    11. Khawaja, FJ, Shah, ND, Lennon, RJ, Slusser, JP, Alkatib, AA, Rihal, CS, Gersh, BJ, Montori, VM, Holmes, DR, Bell, MR, Curtis, JP, Krumholz, HM, Ting, HH (2012) Factors associated with 30-day readmission rates after percutaneous coronary intervention. Arch Intern Med 172: pp. 112-117 2011.569" target="_blank" title="It opens in new window">CrossRef
    12. Lee, EW (2012) Selecting the best prediction model for readmission. J Prev Med Public Health 45: pp. 259-266 2012.45.4.259" target="_blank" title="It opens in new window">CrossRef
    13. Lichtman, JH, Leifheit-Limson, EC, Jones, SB, Watanabe, E, Bernheim, SM, Phipps, MS, Bhat, KR, Savage, SV, Goldstein, LB (2010) Predictors of hospital readmission after stroke: a systematic review. Stroke 41: pp. 2525-2533 CrossRef
    14. Silverstein, MD, Qin, H, Mercer, SQ, Fong, J, Haydar, Z (2008) Risk factors for 30-day hospital readmission in patients 鈮?5 years of age. Proc (Bayl Univ Med Cent) 2008: pp. 363-372
    15. Van Walraven, C, Bennett, C, Jennings, A, Austin, PC, Forster, AJ (2011) Proportion of hospital readmissions deemed avoidable: a systematic review. CMAJ 183: pp. E391-E402 CrossRef
    16. Walraven, CV, Wong, J, Forster, A (2012) LACE+ index: extension of a validated index to predict early death or urgent readmission after hospital discharge using administrative data. Open Med 6: pp. e80-e90
    17. Coleman, EA, Min, SJ, Chomiak, A, Kramer, AM (2004) Posthospital care transitions: patterns, complications, and risk identification. Health Serv Res 39: pp. 1449-1465 2004.00298.x" target="_blank" title="It opens in new window">CrossRef
    18. Choubey, SK, Deogun, JS, Raghavan, VV, Sever, H (1996) A comparison of feature selection algorithms in the context of rough classifiers. pp. 1122-1128
    19. Lazar, C, Taminau, J, Meganck, S, Steenhoff, D, Coletta, A, Molter, C, de Schaetzen, V, Duque, R, Bersini, H, Nowe, A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 9: pp. 1106-1119 2012.33" target="_blank" title="It opens in new window">CrossRef
    20. Molina, LC, Belanche, L, Nebot, A (2002) Feature selection algorithms: a survey and experimental evaluation. ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining. IEEE Computer Society, USA, pp. 306-313 2002.1183917" target="_blank" title="It opens in new window">CrossRef
    21. Agarwal, J (2012) Predicting Risk of Re-hospitalization for Congestive Heart Failure Patients. University of Washington, Masters of Science
    22. Au, AG, McAlister, FA, Bakal, JA, Ezekowitz, J, Kaul, P, van Walraven, C (2012) Predicting the risk of unplanned readmission or death within 30聽days of discharge after a heart failure hospitalization. Am Heart J 164: pp. 365-372 2012.06.010" target="_blank" title="It opens in new window">CrossRef
    23. Brand, C, Sundararajan, V, Jones, C, Hutchinson, A, Campbell, D (2005) Readmission patterns in patients with chronic obstructive pulmonary disease, chronic heart failure and diabetes mellitus: an administrative dataset analysis. Intern Med J 35: pp. 296-299 2005.00816.x" target="_blank" title="It opens in new window">CrossRef
    24. Coffey, RM, Misra, A, Barrett, M, Andrews, RM, Mutter, R, Moy, E (2012) Congestive heart failure: who is likely to be readmitted?. Med Care Res Rev 69: pp. 602-616 2448467" target="_blank" title="It opens in new window">CrossRef
    25. Gronda, E, Mangiavacchi, M, Andreuzzi, B, Municino, A, Bologna, A, Schweiger, C, Barbieri, P (2002) A population-based study on overt heart failure in Lombardy (survey of hospitalization in 1996 and 1997). Ital Heart J 3: pp. 96-103
    26. Hammill, BG, Curtis, LH, Fonarow, GC, Heidenreich, PA, Yancy, CW, Peterson, ED, Hernandez, AF (2011) Incremental value of clinical data beyond claims data in predicting 30-day outcomes after heart failure hospitalization. Circulation Cardiovascular quality and outcomes 4: pp. 60-67 CrossRef
    27. Harjai, KJ, Thompson, HW, Turgut, T, Shah, M (2001) Simple clinical variables are markers of the propensity for readmission in patients hospitalized with heart failure. Am J Cardiol 87: pp. 234-237 2-9149(00)01328-X" target="_blank" title="It opens in new window">CrossRef
    28. Jiang, W, Alexander, J, Christopher, E, Kuchibhatla, M, Gaulden, LH, Cuffe, MS, Blazing, MA, Davenport, C, Califf, RM, Krishnan, RR, O'Connor, CM (2001) Relationship of depression to increased risk of mortality and rehospitalization in patients with congestive heart failure. Arch Intern Med 161: pp. 1849-1856 CrossRef
    29. Joynt, KE, Jha, AK (2011) Who has higher readmission rates for heart failure, and why? Implications for efforts to improve care using financial incentives. Circulation Cardiovascular quality and outcomes 4: pp. 53-59 CrossRef
    30. Kossovsky, MP, Sarasin, FP, Perneger, TV, Chopard, P, Sigaud, P, Gaspoz, J-M (2000) Unplanned readmissions of patients with congestive heart failure: do they reflect in-hospital quality of care or patient characteristics?. Am J Med 109: pp. 386-390 2-9343(00)00489-7" target="_blank" title="It opens in new window">CrossRef
    31. Krumholz, H, Normand, S-L, Keenan, P, Lin, Z, Drye, E, Bhat, K, Wang, Y, Ross, J, Schuur, J, Stauffer, B, Bernheim, S, Epstein, A, Herrin, J, Federer, J, Mattera, J, Wang, Y, Mulvey, G, Schreiner, G (2008) Hospital 30-day heart failure readmissionmeasure:methodology. Centers for Medicare & Medicaid Services (CMS).
    32. Natale J, Wang S, Taylor J: A Decision Tree Model for Predicting Heart Failure Patient Readmissions. / Proceedings of the 2013 Industrial and Systems Engineering Research Conference 2鈥?32鈥?3.
    33. Ross, JS, Mulvey, GK, Stauffer, B, Patlolla, V, Bernheim, SM, Keenan, PS, Krumholz, HM (2008) Statistical models and patient predictors of readmission for heart failure: a systematic review. Arch Intern Med 168: pp. 1371-1386 CrossRef
    34. Wong, EL, Cheung, AW, Leung, MC, Yam, CH, Chan, FW, Wong, FY, Yeoh, EK (2011) Unplanned readmission rates, length of hospital stay, mortality, and medical costs of ten common medical conditions: a retrospective analysis of Hong Kong hospital data. BMC Health Serv Res 11: pp. 149 2-6963-11-149" target="_blank" title="It opens in new window">CrossRef
    35. Zai, AH, Ronquillo, JG, Nieves, R, Chueh, HC, Kvedar, JC, Jethwani, K (2013) Assessing hospital readmission risk factors in heart failure patients enrolled in a telemonitoring program. International journal of telemedicine and applications 2013: pp. 305819 2013/305819" target="_blank" title="It opens in new window">CrossRef
    36. Ibrahim, JG, Chu, H, Chen, MH (2012) Missing data in clinical studies: issues and methods. J Clin Oncol 30: pp. 3297-3303 200/JCO.2011.38.7589" target="_blank" title="It opens in new window">CrossRef
    37. Little, RJ, D'Agostino, R, Cohen, ML, Dickersin, K, Emerson, SS, Farrar, JT, Frangakis, C, Hogan, JW, Molenberghs, G, Murphy, SA, Neaton, JD, Rotnitzky, A, Scharfstein, D, Shih, WJ, Siegel, JP, Stern, H (2012) The prevention and treatment of missing data in clinical trials. N Engl J Med 367: pp. 1355-1360 203730" target="_blank" title="It opens in new window">CrossRef
    38. Luengo, J, Garc铆a, S, Herrera, F (2011) On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowl Inform Syst 32: pp. 77-108 24-2" target="_blank" title="It opens in new window">CrossRef
    39. Kittler, J, Hatef, M, Duin, RPW, Matas, J (1998) On Combining Classifiers. IEEE Trans Pattern Anal Mach Intell 20: pp. 226-239 CrossRef
    40. Torii, M, Hu, Z, Wu, CH, Liu, H (2009) BioTagger-GM: a gene/protein name recognition system. J Am Med Inform Assoc 16: pp. 247-255 2844" target="_blank" title="It opens in new window">CrossRef
    41. Wu, Y, Rosenbloom, ST, Denny, JC, Miller, RA, Mani, S, Guise, DA, Xu, H (2011) Detecting Abbreviations in Discharge Summaries using Machine Learning Methods. AMIA Annu Symp Proc: 2011; Chicago, IL.
    42. Lustgarten, JL, Gopalakrishnan, V, Grover, H, Visweswaran, S (2008) Improving Classification Performance with Discretization on Biomedical Datasets. AMIA 2008 Symposium Proceedings. pp. 445-449
    43. Lustgarten, JL, Visweswaran, S, Gopalakrishnan, V, Cooper, GF (2011) Application of an efficient Bayesian discretization method to biomedical data. BMC Bioinformatics 12: pp. 309 2105-12-309" target="_blank" title="It opens in new window">CrossRef
    44. The U.S. Census Bureau. http://www.census.gov/
    45. Haversine formula. http://en.wikipedia.org/wiki/Haversine_formula
    46. Population Studies Center at the University of Michigan. http://www.psc.isr.umich.edu/
    47. Quan, H, Sundararajan, V, Halfon, P, Fong, A, Burnand, B, Luthi, JC, Saunders, LD, Beck, CA, Feasby, TE, Ghali, WA (2005) Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Medical care 43: pp. 1130-1139 2534.19832.83" target="_blank" title="It opens in new window">CrossRef
    48. Balas, EA, Austin, SM, Mitchell, JA, Ewigman, BG, Bopp, KD, Brown, GD (1996) The clinical value of computerized information services. A review of 98 randomized clinical trials. Arch Fam Med 5: pp. 271-278 271" target="_blank" title="It opens in new window">CrossRef
    49. Desai, MM, Stauffer, BD, Feringa, HH, Schreiner, GC (2009) Statistical models and patient predictors of readmission for acute myocardial infarction: a systematic review. Circulation Cardiovascular quality and outcomes 2: pp. 500-507 2949" target="_blank" title="It opens in new window">CrossRef
    50. Weka 3.6. http://www.cs.waikato.ac.nz/ml/weka/downloading.html
    51. Garci驴a, S, Luengo, J, Saez, JA, Lopez, V, Herrera, F (2012) A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning. IEEE Transactions on Knowledge and Data Engineering.
    52. Kurgan, LA, Cios, KJ (2004) CAIM discretization algorithm. IEEE Trans Knowl Data Eng 16: pp. 145-153 2004.1269594" target="_blank" title="It opens in new window">CrossRef
    53. Tsai, C-J, Lee, C-I, Yang, W-P (2008) A discretization algorithm based on Class-Attribute Contingency Coefficient. Inform Sci 178: pp. 714-731 2007.09.004" target="_blank" title="It opens in new window">CrossRef
    54. Keel Software. 2s.ugr.es/keel/algorithms.php#discretization" class="a-plus-plus">http://sci2s.ugr.es/keel/algorithms.php#discretization
    55. Cessie, SL, Houwelingen, JCV (1992) Ridge estimators in logistic regression. J Roy Stat Soc C Appl Stat 41: pp. 191-201
    56. Demir枚z, G, G眉venir, HA (1997) Classification by voting feature intervals. Machine Learning: ECML 97 1224: pp. 85-92
    57. Van Walraven, C, Dhalla, IA, Bell, C, Etchells, E, Stiell, IG, Zarnke, K, Austin, PC, Forster, AJ (2010) Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. CMAJ 182: pp. 551-557 CrossRef
    58. Zhang, M, Velasco, F, Musser, R, Kawamoto, K (2013) Enabling Cross-Platform Clinical Decision Support through Web-Based Decision Support in Commercial Electronic Health Record Systems: proposal and Evaluation of Initial Prototype Implementations. AMIA.
    59. The pre-publication history for this paper can be accessed here: 2-6947/14/41/prepub" class="a-plus-plus">http://www.biomedcentral.com/1472-6947/14/41/prepub
  • 刊物主题:Health Informatics; Information Systems and Communication Service; Management of Computing and Information Systems;
  • 出版者:BioMed Central
  • ISSN:1472-6947
文摘
Background The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time. Methods Our analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak. Results The analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting classifier which averaged the results of multi-nominal logistic regression and voting feature intervals classifiers. Of 42 final model risk factors, discharge disposition, discretized age, and indicators of anemia were the most significant. This model achieved a c-statistic of 86.8%. Conclusion The proposed three-step analytical approach enhanced predictive model performance for CHF readmissions. It could potentially be leveraged to improve predictive model performance in other areas of clinical medicine.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700