|Year : 2013 | Volume
| Issue : 1 | Page : 40-45
Modification of Acute Physiology and Chronic Health Evaluation II score through recalibration of risk prediction model in critical care patients of a respiratory disease referral center
Ali A Velayati1, Yadollah Mehrabi1, Golnar Radmand2, Ali A Khadem Maboudi2, Hamid R Jamaati3, A Shahbazi4, Seyed A Mohajerani4, Seyed M R. Hashemian1
1 Clinical Tuberculosis and Epidemiology Research Center, National Research Institute of Tuberculosis and Lung Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
2 Department of Biostatistics, School of Public Health, Shahid Beheshti University of Medical Sciences, Tehran, Iran
3 Tobacco Prevention and Control Research Center, National Research Institute of Tuberculosis and Lung Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
4 Chronic Respiratory Diseases Research Center, National Research Institute of Tuberculosis and Lung Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
|Date of Web Publication||22-Mar-2013|
Seyed M R. Hashemian
Clinical Tuberculosis and Epidemiology Research Center, National Research Institute of Tuberculosis and Lung Diseases (NRITLD), Shahid Beheshti University of Medical Sciences, Tehran
Source of Support: None, Conflict of Interest: None
| Abstract|| |
Background: Several models have been developed to measure the severity of illness in intensive care unit (ICU) patients, It is suggested that the models should be customized depending on the characteristics of different population of patients. This study is aimed to assess and modify the performance of Acute Physiology and Chronic Health Evaluation II (APACHE-II) model in a respiratory diseases referral center.
Materials and Methods: A total of 730 patients, admitted to an intensive care unit during one year, were divided into two sets (71% training and 29% test). Our modified APACHE-II model was developed and calibrated on training set. Then, the integrity of the customized model was checked and compared to the original APACHE-II, on the test set. Logistic regression was used to develop ROC analysis, F-measure and kappa coefficient and were employed to calibrate the model.
Results: Both Original and Our modified APACHE-II scores performed acceptable discriminative power (AUC = 0.908: 95%CI 0.861-0.854; and AUC = 0.856: 95%CI 0.789-0.923, respectively); the difference was not significant (P = 0.132). Our modified APACHE-II showed improved accuracy (87.9% vs. 84.1%) and sensitivity (56.4% vs. 16.3%) compared to the original model. F-measure and Kappa also gave the impression of improvement for our modified APACHE-II system.
Conclusion: The results demonstrated that a modified APACHE-II system in a local ICU of respiratory disease could have similar discrimination and comparable calibration to the original model.
Keywords: Acute Physiology and Chronic Health Evaluation II , calibration, intensive care unit
|How to cite this article:|
Velayati AA, Mehrabi Y, Radmand G, Khadem Maboudi AA, Jamaati HR, Shahbazi A, Mohajerani SA, Hashemian SM. Modification of Acute Physiology and Chronic Health Evaluation II score through recalibration of risk prediction model in critical care patients of a respiratory disease referral center. Int J Crit Illn Inj Sci 2013;3:40-5
|How to cite this URL:|
Velayati AA, Mehrabi Y, Radmand G, Khadem Maboudi AA, Jamaati HR, Shahbazi A, Mohajerani SA, Hashemian SM. Modification of Acute Physiology and Chronic Health Evaluation II score through recalibration of risk prediction model in critical care patients of a respiratory disease referral center. Int J Crit Illn Inj Sci [serial online] 2013 [cited 2020 Jul 14];3:40-5. Available from: http://www.ijciis.org/text.asp?2013/3/1/40/109419
| Introduction|| |
The Intensive Care Units (ICU) admitted patients usually have wide range of physiological individualities. Comparison of the patient's outcomes without consideration of individual physiological differences does not seem to be sensible. It is obvious that, some measures to represent the physiological conditions of ICU patients have to be considered in order to control and adjust mentioned differences. During the past decades, comparisons have been made on various models which have been applied for ICU patients' outcomes. The mentioned risk classification systems could also be applied to evaluate the performance of ICUs. In fact these systems are statistical models, by which the probability of an event such as fatality, could be measured from the characteristics and information of each individual patient. 
Similar systems such as Simplified Acute Physiology Score (SAPS), Mortality Probability Model (MPM), and Acute Physiology And Chronic Health Evaluation II (APACHE-II) have been developed in various versions, ,,,,, or estimation of the probability of mortality in the ICUs. The APACHE-II is derived from 11 physiological variables, Glasgow Coma Score (GCS), patient's age and the chronic health status scoring from zero in normal patients to 71 in worst scenario. The probability of mortality could be estimated using a logistic regression model, based on the model coefficients which are derived from categorization of ICU admission causes.
There have been many surveys for evaluation and comparison of the performances of APACHE-II for different populations; but no consistent pattern on accuracy of outcome predictions has been observed. The observed mortalities were more than model predictions in some cases, and less in some other cases, , although some of the other studies have shown a good performance for the models. ,
Many surveys have been carried out to assess the performance of existing models in other populations. They mainly established that the intensive care risk prediction models, primarily developed in other countries, require validation and recalibration, prior to putting into actual exercise, within a new country setting. ,,, In this study the performance of APACHE-II system was evaluated and customized on a number of ICU patients in a tertiary referral center in order to develop a good model for evaluating the severity of illness for respiratory ICU patients.
| Materials and Methods|| |
Patient selection and data collection
Patients admitted to NRITLD's ICU, in the period of time between Jan 2009 to Feb 2010, were included in the study. Clinical, laboratory and monitoring variables were collected from the patients' medical records. Data's were collected by trained general physicians using forms designed for this purpose. To control the quality of data, first 10 sheets filled by each person were reviewed by the supervisor; and one out of twenty forms was rechecked afterwards. Finally the quality controlled data sets were entered to a data bank.
The quantitative variables of different groups were presented in form of mean ± SD and the qualitative variables were presented by n (%). Student's t test was applied to compare the quantitative variables between different groups, and the qualitative variables were compared using chi-square or Fisher's exact test. The total sample was divided into two subsamples:
The scores and probability of death were computed according to the original APACHE II model.
- Training set for refitting the variables into a modified model.
- Test set for evaluating the model developed.
Development of the modified APACHE-II
All variables of APACHE-II were entered in a logistic regression model and new coefficients were estimated using training data set. For each categorical variable, the normal group was considered as reference group. The modified scoring system was developed based on significant regression coefficients; non-significant variables, according to the Wald test, were allocated zero score. The variable with the smallest coefficient was given a score of 1. The coefficients of the other significant variables were divided by the smallest coefficient and then rounded to obtain their scores. After preparing the modified scoring system, which we will call modified APACHE-II hereafter, the modified scores were calculated for all patients in the study. These scores were employed as independent variable in a logistic regression models with observed death as dependent variable.
Evaluation of the discriminative powers
For assessment of the discriminative power of different models in both training and test data sets, the Receiver Operating Characteristic (ROC) curve analysis and its Area Under Curve (AUC) was used.
Sensitivity, Specificity and Accuracy rate were used to evaluate the calibration of different models. The Kappa coefficient was also used for evaluating the agreement of observed and expected mortality according to the models. We used F-measure,
F = (2 × Sensitivity × PPV)/(PPV + Sensitivity)
A harmonic mean of the sensitivity and positive predictive value (PPV), as a measure of calibration.  In addition, Hosmer-Lemeshow test, goodness of fit test was used to evaluate the calibration. For this purpose, the sample was sorted increasingly by the expected probability of death and then divided into 10 approximately equal size groups. Then the observed and expected mortalities were plotted for each group. The Hosmer-Lemeshow test was not applicable here because the expected mortality in some groups was less than 5; so we just used groups made of this method.
| Results|| |
Main characteristics of the patients
Among 730 patients who were admitted during the study time in the ICU, 143 (19.6%) died in hospital. There were 423 (57.9%) males and 307 (42.1%) females in this sample. The mean age was 46.9 ± 9.19 years; that was significantly higher in death group compared to alive (P < 0.001). The mean of original APACHE-II score in death group was significantly more than alive (16.84 ± 7.2 vs. 7.37±4.6; P < 0.001). The main characteristics of patients are shown in
[Table 1]. The first diagnosis at the time of admission was categorized according to APACHE-II classification [Table 2].
Training and test sets
The total sample was divided into two groups: 523 (71.6%) in training dataset and 207 (28.4%) in the test set. The comparison of basic characteristics like age, sex and mortality between training and test set shows no significant difference (P = 0.081, P = 0.905 and P = 0.823, respectively).
The variables of the Original APACHE-II system were entered in a logistic regression models and modified scoring system was developed [Table 3]. Wald test showed that variables including temperature, mean arterial pressure, serum sodium, serum potassium and serum creatinine were not significant between two groups. Therefore these variables were removed from the modified APACHE-II system. Accordingly, the modified APACHE-II scores were computed for all patients; and the following model was developed for estimating the probability of death:
Logit (π) = -4.866 + 0.331 × Local APACHE-II Score
Comparing the discriminative powers
In the training set, the area under the ROC curve for original APACHE-II was 0.860 (95% CI: 0.820-0.899), and for modified APACHE-II was 0.874 (95% CI: 0.832-0916); in the test set AUC for Original APACHE-II was 0.908 (95% CI: 0.861-0.954) and for the Modified APACHE-II was 0.856 (95% CI: 0.789-0.923) [Figure 1].
|Figure 1: Roc curve analysis for comparing the discriminative power of Original and modified APACHE II systems; (a) The ROC curves for the test set; (b) The ROC curve for the training set|
Click here to view
Comparing the calibrations
The results of Original APACHE-II shows that on test set it had 84.1% accuracy, 16.3% sensitivity, 100% specificity, F-measure = 0.280 and Kappa = 0.228; while the Modified APACHE II system had 87.9% accuracy, 56.4% sensitivity, 96.4% specificity, F-measure = .656 and Kappa = 0.593 [Table 4].
To evaluate the calibration, observed probability was plotted against expected probability, in 10 groups of Hosmer-Lemeshow method [Figure 2]. In this plot, the solid line shows the reference line for the good calibration and the dashed line shows relationship between expected and observed probabilities. The deviation of the dashed line from the solid one shows the deviation of expected from the observed probabilities. Accordingly, the Modified APACHE-II system shows better calibration in both training and test datasets.
|Figure 2: Comparing the calibration of Original and Modified APACHE II via 10 groups of Hosmer-Lemeshow method, (a) Original APACHE II on training set, (b) Original APACHE II on test set, (c) Modified APACHE II on training set, (d) Modified APACHE II on test set|
Click here to view
| Discussion|| |
In this study we tried to customize one of the most commonly used models of ICU risk-adjustment and embarked on developing a modified model with comparable performance based on our ICU patients. The main reason of using APACHE-II instead of APACHE-IV in this study was that the APACHE-IV system has some variables such as albumin and bilirubin that are not routinely registered for patients in our hospitals at first 24 hours of ICU admission. Accordingly using the APACHE-IV system was not feasible. The performance of different predictive models has two main aspects; discriminative power and calibration (goodness-of-fit). The results of this survey showed that, the Original APACHE-II system had good discriminative power in our population, but poor calibration according to the goodness of fit measures. The Modified APACHE-II score showed a good discriminative power as well, although its area under ROC curve in the test set was less than the Original APACHE-II. Although the non-significant discriminative power and slight increase in accuracy (87.9% Vs. 84.1%) are among the outcomes of our study, the remarkable increase in sensitivity could be due to the sample selection (all from a tertiary respiratory referral center). Thus it is recommended that the external validity of our model would be considered in further studies.
In this study, all calibration indices were in favor of the Modified APACHE-II except specificity and positive predictive values. This might be due to underestimation of the probability of death in our population by the original APACHE-II score. There are several factors that have effect on the performance of models in different populations. Some researchers believe that the different results of model calibrations could be the effect of various combinations of patients.  Most of the patients in the present study were suffering from respiratory and lung diseases making the combination of diseases different from which the original APACHE-II has been developed. This might be one of the reasons for the difference in calibration of APACHE-II score in our population.
Most of the previous studies have shown good discriminative power but different calibration, , yet researchers try to improve the performance of these models. ,, Murphy Filkins et al. showed that when a unit or patient population differs substantially from average condition, using the customized models is important. They showed that increasing frequency of patients with each disease characteristics above the original frequency may cause the discrimination and calibration to deteriorate.  Furthermore, APACHE-II system is consisted on characteristics of patients at the first 24 hours of admission, and measures cannot be considered independent from treatments and the quality of medical care. Moreover, the starting point for this model is the time of admission which does not have standard definition and is often influenced by the condition of the ICU such as number of beds, quality of pre-hospital care, etc. , This all could explain why the modification of models for different populations or specific groups of patients such as respiratory patients is needed.
Our modified model leaves mean arterial pressure and creatinine out of the system. Our patient population, due to random factors, might have displayed a low incidence of renal failure, and so creatinine would not add much to the predictive model. Omitting the mean arterial pressure (MAP) from the model, could be due to its strong co linearity with other variables such age, PH and WBC. Since these variables were more powerful predictors than MAP; and they were highly correlated with MAP, putting them together in a model could increase the variance of regression coefficients and consequently decrease the precision of model. Definitely, it doesn't mean that MAP is not a powerful factor in estimation of the severity of illness. Although we discuss cogent points about customization, calibration and discrimination of APACHE models, the other aspect is that APACHE and MPM have primarily been developed on USA patients; while SAPS has a more international component and the updated versions did not include USA patients, so we suggest future studies designed based on SAPS.
Definitely modified models would benefit the most if they are recalibrated in a larger ICU patient population. A limitation of this study was that the sample was just from one ICU. It is anticipated that, a larger sample size that includes respiratory patients with different characteristics from different ICUs would lead to development of better models; and could help to come across the deficiencies and improve this modified model. Besides, the calibration and discriminative power of this customized model could be studied in other respiratory disease ICUs.
| Conclusion|| |
The results of this survey show that the calibration of APACHE II model on specific group of patients (respiratory disease) reduced the number of variables and enhanced its performance. It seems that APACHE-II score has its own pros and cons and could be modified to increase its accuracy, performance, and adaptability in a local ICU. Obviously, with increasing developments in treatment methods and changing the mortality patterns in different populations, the scoring systems need frequently change and update. Also, the results of this research emphasize that fitting the new models for specific groups of patients leads to reach more abstract models with fewer variables.
| References|| |
|1.||Gregoire G, Russell J. Assesment of Severity of illness. New York: McGraw-Hill; 1998. |
|2.||Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: A Severity of disease classification system. Crit Care Med 1985;13:818-29. |
|3.||Le Gall JR, Loirat P, Alperovitch A, Glaser P, Granthil C, Mathieu D. A Simplified Acute Physiology Score for ICU patients. Crit Care Med 1984;12:975-7. |
|4.||Moreno RP, Metnitz PG, Almeida E, Jordan B, Bauer P, Campos RA, et al. SAPS3 investigators. SAPS 3- from evaluation of the patient to evaluation of the intensive care unit, part 2: Development of a prognostic model for hospital mortality at ICU admission. Intensive Care Med 2005;31:1345-55. |
|5.||Lemeshow S, Teres D, Klar J, Avrunin J, Gehlbach S, Rapoport J. Mortality probability models (MPM II) based on an international cohort of intensive care unit patients. JAMA 1993;270:2478-86. |
|6.||Lemeshow S, Teres D, Avrunin JS, Gage RW. Refining intensive care unit outcome prediction by using changing probabilities of mortality. Crit Care Med 1988;16:470-7. |
|7.||Higgins TL, Kramer AA, Nathanson BH, Copes W, Stark M, Teres D. Prospective validation of the intensive care unit admission Mortality Probability Model (MPM0-III). Crit Care Med 2009;37:1619-23. |
|8.||Capuzzo M, Valpondi V, Sgarbi A, Bortolazzi S, Pavoni V, Gilli G, et al. Vlidation of severity scoring systems SAPS II and APACHE II in a single center population. Intensive Care Med 2000;26:1779-85. |
|9.||Mrakgraf R, Deutschinoff G, Pientka L, Scholten T. Comparison of acute physiology and chronic health evaluations ii and iii and simplified acute physiology score II: A prospective cohort study evaluating these methods to predict outcome in a German interdisciplinary intensive care unt. Crit Care Med 2000;28:26-33. |
|10.||Arabi Y, Haddad S, Goraj R, Al-Shimemeri A, Al-Malik S. Assessment of performance of four mortality prediction systems in a Saudi Arabian intensive care unit. Crit Care 2002;6:166-74. |
|11.||Sakr Y, Krauss C, Amaral AC, Réa-Neto A, Specht M, Reinhart K, et al. Comparison of the performance of SAPS II, SAPS 3 and APACHE II and their customized prognostic models in a surgical intensive care unit. Br J Anaesth 2008;101:798-803. |
|12.||Le Gall JR, Neumann A, Hemery F, Bleriot JP, Fulgencio JP, Garrigues B, et al. Mortality prediction using SAPS II: an update for French intensive care units. Crit Care 2005;9:R645-52. |
|13.||Harrison DA, Brady AR, Parry GJ, Carpenter JR, Rowan K. Recalibration of risk prediction models in a large multicenter cohort of admissions to adult, general critical care units in the United Koingdom. Crit Care Med 2006;34:1378-88. |
|14.||Render ML, Deddens J, Freyberg R, Almenoff P, Connors AF Jr, Wagner D, et al. Veterans affairs intensive care unit risk adjustment model: Validation, updating, recalibration. Crit Care Med 2008;36:1031-42. |
|15.||Hripcsak G, Rothschild AS. Agreement, the F-measure, and Reliability in information Retrieval. J Am Med Inform Assoc 2005;12:296-8. |
|16.||Hashemian SM, Jamaati HR, Malekmohammam M, Ehteshami Afshar E, Alosh O, Radmand G, et al. Assessing the performance of two clinical severity scoring systems in the ICU of a tertiary respiratory disease center. Tanaffos 2010;9:58-64. |
|17.||Murphy-Filkins R, Teres D, Lemeshow S, Hosmer DW. Effect of changing patient mix on the performance of an intensive care unit severity-of-illness model: How to distinguish a general from a specialty intensive care unit. Crit Care Med 1996;24:1968-73. |
|18.||Guidelines for intensive care unit admission, discharge, and triage. Task Force of the American College of Critical Care Medicine, Society of Critical Care Medicine. Crit Care Med 1999;27:633-8. |
[Figure 1], [Figure 2]
[Table 1], [Table 2], [Table 3], [Table 4]