Article Text

Download PDFPDF

Predicting survival using simple clinical variables: a case study in traumatic brain injury
  1. Centre for Clinical Decision Sciences, Department of Public Health, Erasmus University Rotterdam, The Netherlands
  1. Chantal W P M Hukkelhoven, Center for Clinical Decision Sciences, Ee2073, Department of Public Health, Erasmus University Rotterdam, The Netherlands emailhukkelhoven{at}
  3. P A JONES,
  1. University Department of Clinical Neurosciences, Bramwell-Dott Building, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK

    Statistics from

    Signorini et al 1developed a prognostic model to predict survival at 1 year for patients with traumatic brain injury. A strong point is that this model uses variables which are easy and cheap to measure. A thorough statistical analysis was performed, including tests for goodness of fit and checks for influential observations. The model was also validated externally in a more recent group of patients. However, during the external validation the Hosmer-Lemeshow statistic showed a significant lack of calibration (p<0.0001).

    This implies that the model does not give accurate predictions of the survival of “new” patients. The lack of calibration is especially due to an overly pessimistic prediction in the patients with a poor prognosis but also to a too optimistic prediction for patients with a better prognosis (fig 2).1 This is typical for “overfitting”—that is, that a model tends to predict too extreme probabilities in new patients.

    Overfitting can be limited by several procedures. One of them is that, as a rough estimate, no more than m/10 predictor degrees of freedom (df) should be analysed to construct a multiple regression model, where m is the number of events (for example, deaths).2 As 87 patients died within 1 year, 87/10=8.7 df could be examined during the course of analysis without risk of overfitting. In the paper 6 df were used by the final multivariate prognostic model. However, age was fitted as a piecewise linear variable after using a generalised additive model, requiring an unknown number of df, but always more than one. Furthermore, we assume that easy to achieve variables such as sex (1 df) and cause of injury (3 df) were considered but dropped during model construction. Also some of the candidate variables originate from combined variables when, after initial assessment, it seemed that some categories could be collapsed. Altogether this means that probably much more than 8.7 df were examined.

    The overfitting could have been corrected by multiplying each regression coefficient in the model with a shrinkage factor. This factor can be estimated by a heuristic formula,3 by cross validation, or by a bootstrap resampling procedure. This can be done with the Design library,4 which was already used by the authors. The shrinkage factor is close to unity when there is no overfitting. When the selection of predictors is unstable or predictors have small effects, a lower shrinkage factor might be found— for example, 0.8.

    We regret that the model is presented as giving “reasonable accurate predictions of long term survival”, especially because the external validation showed a significant lack of calibration. Correction with a shrinkage factor would have resulted in a recalibration of the probability of survival in the nomogram presented in the paper (fig 3)1 and in the formula used in a subsequent paper.5

    We hope that modern modelling techniques will increasingly be applied in clinical prediction problems such as traumatic brain injury, such that prognostic models are developed that reliably support the physician in clinical decision making.


    Signorini et al reply:

    Hukkelhoven et al give a thorough and constructive criticism of the statistical procedures used to construct the model presented in the paper. Their main points of concern regard the effective number of degrees of freedom (df), possible corrections to the apparent overfitting, and the usefulness of the model for individual predictions in specific patients.

    It is true that the 6 df present in the final model do not reflect the total uncertainty present in the model, and that some preprocessing of individual predictors was performed to derive appropriate functional forms. The rule of thumb regarding the number of predictor variables which can be assessed in a multivariate model is a guideline, and it should always be remembered that the reason behind it is to prevent false positive findings and hence spurious associations between predictors and outcomes. It is directly analogous to the 5% significance level for hypothesis testing, and we worry that in its increasing prevalence in the literature it is becoming similarly dogmatic. We do not think that we have indulged in any data-dredging to construct these models, and are confident that the false association rate is small. To fully incorporate the overall uncertainty into the final model would perhaps involve methods discussed by Draper,1-1 with a corresponding increase in the complexity of the modelling process.

    The use of shrinkage estimators to prevent overfitting is of course a valuable tool, yet as Hukkelhoven et alpoint out, there are several options for their calculation and little guidance as to which should be used in a particular circumstance. They are available within the design library used to build our model, but the model building process as described in the original paper is achievable using any standard statistical software package. The purpose of the paper was to demonstrate what we think of as a sensible approach, and to go beyond what is possible in standard software would be to dilute that message.

    Finally, the model perhaps should not be described as providing “accurate” predictions of long term survival, as the out of sample calibration was not good. From a discrimination point of view, however, the out of sample performance was adequate, and this serves to illustrate that the uses to which a model will be put should play a part in the model building process. Whether calibration (individual predictions) or discrimination (case mix adjustment) is more important can result in different models from the same training set.

    One of the most important points of the paper was to stress that there is a lot more to proper statistical model building than clicking the correct menu option in a statistical package. We would hope that this correspondence has emphasised the need for a certain level of statistical knowledge and experience in the analysis of any research data. We agree wholeheartedly with the views expressed in the correspondents' final paragraph.


    1. 1-1.

    Request Permissions

    If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.