Article Text
Abstract
Background Accurate prediction of stroke outcome is desirable for clinical management and provision of appropriate care, and potentially for stratification of patients into studies.
Objectives To investigate the predictive properties of validated scales and severity measures, and their constituent variables, and to compare their prediction in six European populations.
Methods We studied 2033 first-ever stroke patients in population-based stroke registers in France, Italy, Lithuania, the UK, Spain and Poland. Logistic models were used to predict independent survival at 3 and 12 months after stroke using a range of measures including the Six Simple Variable (SSV), Barthel index (BI) and the National Institute of Heath Stroke Scale (NIHSS). Predictions were compared within and between populations using receiver operating characteristic curves. A five-variable scale was developed and validated.
Results Comparisons of BI with BI+age, and NIHSS with NIHSS+age, across populations showed that inclusion of age significantly improved prediction. Fairly equal predictions were obtained by three models: five variables, BI+age, and NIHSS+age. Better agreement between predicted and actual outcomes, and more precise estimates were obtained by the five variables model (age, verbal component of the Glasgow Coma Scale, arm power, ability to walk, and pre-stroke dependency).
Conclusions Living alone before the stroke was not significantly associated with independent survival after the stroke. Five variables (excluding living alone, from the SSV) provided good prediction for all populations and subgroups. Further external validation for our estimates is recommended before utilisation of the model in practice and research.
- Scales
- Stroke
- Health Policy & Practice
- Clinical Neurology
Statistics from Altmetric.com
Introduction
Several scales of severity of stroke outcomes have been developed and used in research and clinical practice. Some of these are based on a single scale or the sub-components of a scale, using clinical variables.1–3 Others are more complex including clinical variables along with radiological features, or other composite measures.4–6 These measures have been used as a tool to stratify patients into trials, to guide treatments and prognosis, and to plan discharge and rehabilitation.7
Choice of a prediction model remains controversial as no single model is likely to suit all situations, subgroups and/or times. Simpler models that include reliably measurable variables are however preferable as more practical and often provide prediction that is equivalent to predictions based on more complex yet evidence based measures such as CT-derived variables. Counsell et al 5 showed that a six-simple-variables (SSV) model including age, the verbal component of the Glasgow Coma Scale (GCS), arm power, ability to walk, pre-stroke living condition and pre-stroke dependency, is better than complex models that include urinary continence, level of consciousness and stroke subtypes, in predicting independent survival at 6 months and 1 year following stroke. Similarly, Reid6 illustrated that prediction by simple clinical variables was not improved by the use of CT-derived radiological variables, whereas a more recent study developed a score for the prediction of death in hospitalised ischaemic stroke patients based on a large number of predictors including the Canadian Neurological Scale, co-morbidities such as cancer, atrial fibrillation (AF) and coronary heart disease, and suggested that such scores are better than simpler models at prediction.8
In terms of stratifying patients to clinical trials using case severity scales, Govan et al looked at the Barthel Index (BI), Rankin Scale (RI), the National Institute of Health Stroke Scale (NIHSS) and the Scandinavian Stroke Scale (SSS)and showed that all scales had excellent agreement in predicting death and dependency at 1 and 3 months after stroke, and little predictive information was lost when scales were used in their categorical form—for example, mild, moderate and severe rather than as continuous scores.9 ,10
The aim of this multinational study is to investigate the value of several standardised well validated stroke severity measures in the prediction of independent survival at 3 and 12 months following stroke, and to evaluate the importance of their constituent variables in different European populations.
Methods
The European Registers of Stroke (EROS) Collaboration population-based stroke registers were established in six European populations: France (Dijon), Italy (Sesto Fiorentino), Lithuania (Kaunas), Poland (Warsaw), Spain (Menorca), and the UK (London). Methodology of the EROS study has been described in detail previously.11 ,12 Data were collected between 2004 and 2006. Centres within different countries were selected on the basis of previous experience in running population-based or hospital-based stroke registers, and were therefore not necessarily representative of their countries as a whole. Methods of case ascertainment for hospitalised as well as non-hospitalised cases were standardised across centres using published standards including overlapping sources of information that were reported previously.11 ,13 Patients with first-ever stroke of all age groups from the source populations of the respective centres were included in the study. The study was ethically approved by ethics committees in each city involved in the study.
Variable definition
Stroke was defined according to the WHO definition14 plus CT scan confirmation. Stroke was classified into cerebral infarction, primary intracerebral haemorrhage and subarachnoid haemorrhage (SAH), based on results from at least one of the following diagnostic results: brain imaging performed, and cerebrospinal fluid analysis (in all living cases of SAH where brain imaging was not diagnostic) or necropsy examination. Cases without pathological confirmation of stroke subtype were classified as undefined.
Variables from the SSV model were used to predict stroke severity.15–17 In addition, a selection of other severity measures that were identified previously as good predictors7 ,18 ,19 and for which data have been collected for EROS collaboration, including the NIHSS and BI, were also used to predict outcomes.
The two outcomes investigated were independent survival at 3 and 12 months following stroke. Independence at each time point was defined as BI score ≥12 out of 20.20
Table 1 describes the six European populations. A logistic regression model was employed to predict independent survival using various scales and variables described before as predictors. The model's ability to predict outcomes was primarily assessed by calculating the area under receiver operating characteristic curves (AUC) of sensitivity versus (1-specificity). The AUC was used to measure how well a model correctly classifies patients into two groups of independent survivals or not. An AUC of 0.5 corresponds to a prediction that is no better than chance where 50% of patients would be assigned to each group, a higher AUC means better classification, and an AUC of 1 corresponds to perfect classification. The AUC was calculated by the trapezoidal rule.21 ,22 To compare the prediction of different models within centre, differences in AUCs and their 95% CIs were calculated and used for comparison. Where comparisons are between centres, the same model was fitted for each centre; AUCs were similarly calculated and compared using London as a reference. The Akaike information criterion (AIC) and the Bayesian information criterion (BIC) were in addition calculated where two models provide an equal AUC and were fairly close in other measures of comparisons such as Hosmer–Lemeshow. Data on many measures for Menorca were insufficient and the centre was excluded from all comparisons. All analyses were undertaken using STATA V.11.2.
The accuracy of models’ prediction was examined for a range of subgroups, including whether the patient was incontinent or not in the acute phase, age <75 or ≥75 years, gender, time the patient took to present to hospital (within 6 h of stroke symptoms or more), whether stroke type was ischaemic or not, and whether the patient was admitted to stroke unit or not (see online supplementary table S4)
To assess the prediction capacity of the SSV, BI and the NIHSS scale, these were used to predict 3 and 12 months’ independent survival for each centre. For BI and the NIHSS scales, they were first used as predictors of outcomes, each on its own, then in combination of age. The constituent variables of the SSV model on the other hand were used as predictors of the two outcomes for each centre. On the basis of the associations of the SSV with outcomes, a five-variable model was proposed, excluding living conditions pre-stroke from the SSV model. To validate the proposed five-variables model, data were randomly split into two sets, a training set and a validation set. The training set was used to develop coefficients (estimates of associations with outcomes), and the other set to validate them by calculating prediction probabilities and AUC using those estimates. The binary variables coding of 1=yes and 2=no, used in the original study that developed the SSV model,15 was maintained for all validations to allow for direct comparisons. The two other models—BI+age and NIHSS+age—were validated using the same technique. The AUC for each of the three models using the coefficients developed in this study (table 3) were compared with that of the SSV model based on the established coefficients (table 4).
An additional external validation was performed for the three models—five simple variables (FSV), BI+age and NIHSS+age—by splitting populations randomly into two. The first set comprised London and Florence (n=620) used for derivation, and the second comprised Kaunas, Dijon and Warsaw (n=1336) used for validation (not presented).
Calibration of a model is an evaluation of the extent to which the model's estimated probabilities of positive outcome (independent survival) corresponds to actual observed probabilities. Calibration was evaluated with the Hosmer–Lemeshow goodness-of-fit test.23 Observed probabilities were plotted against the predicted probabilities by deciles (10 groups of equal participants) of observed probabilities. In situations where the expected outcome within groups was too small (≤5), 9 groups were used instead of 10. Figure 2 presents calibration of the SSV, a five-variables model, BI+age and NIHSS+age, for the outcome of independent survival at 1 year.
More complex models including co-morbidities prior to stroke such as hypertension, myocardial infarction, AF and diabetes were also examined in addition to the aforementioned predictors. Models were fitted for each centre separately as well as for the full data using centre as an explanatory variable to adjust for differences between centres.
Results
Table 1 describes the characteristics of the 6 populations.
Table 2 summarises the adjusted ORs and 95% CIs of the logistic model using the SSVs as predictors of independent survival at 3 months and 1 year after stroke for each population. The results show that age is a strong predictor of outcomes in all populations. Ability to walk and ability to lift arms were both associated with outcomes but occasionally not significant at the 1% level, such as in Warsaw (ability to lift arms) and in Dijon (ability to walk and ability to lift arms). Prior to stroke dependency (BI<12) showed some variations in its association with the outcomes. Living alone on the other hand was not a significant predictor of both outcomes for all centres. Further analyses were undertaken to elucidate the unexpected lack of association between the outcome and living alone, and the inconsistency of the association between the outcome and pre-stroke dependency, using each variable as the only predictor, then adding further variables in stepwise fashion. For living alone the variable remained non-significant in all models. For pre-stroke dependency on the other hand, the univariate and simpler (less covariates) models support the importance of the variable, but confounding by ability to walk and ability to lift arms was detected. The extent of confounding was slightly different for different populations. As living alone was not a significant predictor, only FSV were used to replace the SSV in the comparisons within and between populations. Comparisons of BI with BI+age, and NIHSS with NIHSS+age, across populations showed that both scales were significant predictors of outcomes, and that inclusion of age significantly improved predictions.
Within-centre comparisons: The FSV model was compared with BI+age and NIHSS+age. For both comparisons, the FSV model provided either an equal or larger AUC. The difference was generally not significant (results available from authors).
Validation of the FSV, BI+age and NIHSS+age, models
The coefficients derived from the training data for the three models, as predictors of 1 year independent survival, are presented in Table 3, together with the published coefficients of the SSV model. Although estimates of the FSV model were often similar in magnitude and direction of associations to those of the SSV, nonetheless some differences were remarkable and the estimates of the FSV were generally more precise. Estimates for the 3 months outcome maintained similar features (not presented).
The derived coefficients for the three models—FSV, BI+age and NIHSS+age—using the derivation data set, and the established SSV coefficients, were used to calculate prediction probabilities from the second half of the data (validation set) and AUCs for these models were compared using the SSV model as a reference. These comparisons show no differences in the prediction capacity and AUCs were fairly equal. Differences in sample sizes were due to missing data for BI and the NIHSS measures (table 4).
The external validation for the three models based on a random split of populations showed good prediction for all three models, with AUCs approaching 90%. The two models—BI+age and NIHSS+age—have shown good agreement with the FSV model and no significant differences in AUCs were reported. The FSV has better AIC and BIC (results available from authors).
Comparing the prediction of the FSV across populations with a reference (London)
The AUCs were found to be fairly similar for each of the populations compared with London for the 3 and 12 months outcomes (figure 1). The only exception was Dijon for which the AUC was 0.81 (0.03) compared to 0.91 (0.02) for London (p=0.005) for the 3 months outcome.
The performance of the FSV model in a range of subgroups was tested and compared with that based on BI+age and NIHSS+age. All three models performed well in the prediction of outcomes for the subgroups investigated. No differences were noted between incontinent or not, between age <75 or ≥75 years, by time to presentation (<6 h or ≥ 6 h from symptoms), by subtype or by admission to stroke unit. Significantly better prediction for men was obtained by the FSV, and for women by BI+age. All three models provided significantly better prediction for the ‘not admitted’ to stroke unit group (see online supplementary table S4).
Model calibration showed good fit for all four models described: the SSV, FSV, BI+age and NIHSS+age. The FSV model did better than BI+age and NIHSS+age. This was shown by the larger p value for Hosmer–Lemeshow statistics. The graphs highlighted better prediction at higher probabilities of independent survival compared to lower probabilities for all models (figure 2). The AIC and the BIC, on the other hand showed smaller values for the FSV compared to the SSV. The AIC values were 612.94 and 613.65 and the BIC values were 640.79 and 646.15, for the FSV and the SSV respectively, using the derivation dataset. The addition of co-morbidities did not improve the prediction achieved by any of the four models investigated.
Discussion
This six European populations based study illustrated that the prediction of independent survival obtained by the SSV model and that obtained by a five-variables model (excluding living alone pre-stroke from the SSV) were equivalent. It has also shown that good prediction can be obtained by two other models (BI+age and NIHSS+age) for all centres and for a range of subgroups. The five-variables model however has some superiority over all other models examined; it is simpler, yet maintains the favourable features of the SSV model, including good discrimination, and comprises variables that are clinically feasible to collect at ward level by non-specialist staff.24 The model has better calibration and better goodness of fit, and provided highly precise estimates.
The robustness of a range of models in the prediction of independent survival at 3 and 12 months following stroke shown by this study was consistent with similar studies that compared models. Counsell et al,5 for example, demonstrated the equivalence of models derived from three sets of variables for the prediction of survival at 3 months and independent survival at 6 months following stroke. Govan et al 10 have examined the Scandinavian Stroke Scale (SSS); modified Rankin Scale (mRS); Barthel Index (BI) and the modified National Institute of Health Stroke Scales (NIHSS) concluded that categorical scales are as good as full scales, with slight superiority for the full scales in the prediction of death or dependency 1 month following stroke. In another relatively smaller study, Castellanos et al 19 showed that mortality, 1 month after stroke, may be correctly predicted in 85% of patients by GCS, cortical location of bleeding and low fibrinogen concentrations. The SSV used for the prediction of 30 days’ survival and 6 months’ independent survival was also shown to provide good prediction of 6 months’ independent survival and being at home.17
In general no major differences in prediction by subgroups were reported, with the exception of the better prediction obtained for patients not admitted to stroke units, which was consistent throughout all models. The reasons for these differences are not clear, but possibly stroke unit is associated with other factors that were not taken into account by all the variables considered. The better prediction by BI+age for men was minimal, and given the subgroup nature of these analyses, should be cautiously interpreted at best as hypothesis generating.
The suitability of the five-variables model for a range of subgroups agrees with earlier findings for the SSV model providing adequate prediction of 6 months outcome for hyper-acute or post-acute stroke, age <75 or ≥75 years, haemorrhagic or ischaemic stroke, men or women, moderate and severe stroke dependent survival, at 6 months.21
Comparisons have shown least variations in AUC across centres for the NIHSS+age model. However, more missing values were reported, with almost one third of NIHSS data missing in one centre. The good prediction of this scale in combination with age was previously reported,18 but its limitations remain the relatively large number of items, and the need for training on its use,25 which might explain the considerably large number of missing observations reported. BI in the acute phase plus age has on the other hand shown good consistency of estimates. The scale comprising 10 items has been reported in a recent systematic review as one of the most frequently used scales for the assessment of Activities of Daily Living (ADL) after stroke, and many of its items were ranked among the best evidence-based items for the prediction of limitations at ≥3 months following stroke.26 Despite some missing data (minimum 3.6% in Dijon and maximum of 18.5% in Warsaw), the model seems to support the importance of the scale in prediction, suggesting the need for completeness of its routinely collected data.
The inconsistency in the association of pre-stroke dependency and independent survival noted in some centres has been further tested and may be explained by: first, the use of Oxford Handicap Scale (OHS) as one of the SSV, while we have used BI.15 The two scales are different; while BI is an ADL and mobility-specific scale, the OHS is a global functional index.27 These differences would result in different types of associations between the explanatory variables in each model. For example, pre-stroke dependency (assessed by BI) has shown association with mobility items such as ability to lift arms and ability to walk. These associations and the extent of confounding would be different if pre-stroke dependency was assessed by OHS. Second, confounding was found to be of varying intensity across centres, and when BI score in the acute phase was used instead, the latter was found to be independently associated with both outcomes for all centres, suggesting possible variations in the accuracy of recalled information across centres, ‘recall bias’.
Robustness in term of discrimination performance assigning participants to the correct outcome may not necessarily be met by closeness of predicted probabilities and actual observed probabilities of outcomes.28–30 In this study, the SSV and the FSV models provided the best agreement between these two probabilities. The similarity between the two models was expected since the FSV is a modified version of the SSV, excluding living alone which was found to be insignificant, hence removing it would have either improved the models’ goodness of fit, or made no difference. For the other two models—BI+age and NIHSS+age—a slightly less but a fairly close agreement between actual and predicted probabilities was found. While additional variables might improve models’ discrimination, this is unlikely to be the case for our data as the AUC obtained was around 0.90, approaching the maximum possible for the type of data.28 ,29 In comparison with other models that included more variables, Saposnik et al's 12 predictors,8 ,31 for example, has obtained a maximum AUC of 0.85, to predict 1 month and 1 year mortality, following hospitalisation for acute ischaemic stroke patients. Despite differences in the two settings, measures such as BI, NIHSS, ability to lift and ability to walk, might have accounted for all the predictors used in the aforementioned score, due to the strong associations between these measures and stroke severity, co-morbidity and stroke subtypes which were used in the score.
The study has explored the validity of commonly used severity scales across European centres using community-based stroke registers. A simpler predictive model comprising five variables provided very good prediction of independent survival at 3 and 12 months following stroke. The model provides estimates of high precision, best agreement between observed and predicted outcomes, and better AIC compared to the SSV, and hence was superior to other well validated measures. The populations involved in the study are not necessarily representative of their countries, and the sample sizes may not be large enough, but the methodological advantages of the design include less biased population sampling and detailed documentations of care process and severity measures, suggesting our investigations were a suitable approach to evaluate a range of prediction models and to propose modifications. Although the use of population-based stroke registers suggests potentially better generalisability of the estimates derived by this study, external validation using independent data would nonetheless be invaluable.
Acknowledgments
We wish to thank all the patients and their families and the healthcare professionals involved in the different centres. We wish to acknowledge the followings: Sesto Fiorentino: M Lamassa, MD (collection of data), P Nencini, MD (collection of data), A Poggesi, MD (collection of data), F Pescini, MD (collection of data), A Cramaro, MD (collection of data), E Magnani, MD (collection of data), (Department of Neurological and Psychiatric Sciences, University of Florence); M Baldereschi, MD (Institute of Neurosciences, Italian National Research Council, Florence, collection of data); Kaunas: D. Sopagiene, MD (collection of data), D Kranciukaite, MD (collection of data) (Institute of Cardiology c/o Kaunas University of Medicine, Kaunas); Menorca: J Rodriguez-Mera, MD (Area de Salud de Menorca, ib-salut, Menorca, collection of data); Warsaw: M Głuszkiewicz, MD (2nd Department of Neurology, Institute of Psychiatry and Neurology, Warsaw, collection of data); J Pniewski, MD (Neurology Department, Medical Research Centre, Polish Academy of Sciences/CSK MSWiA, Warsaw, collection of data). Data centre: V Moltchanov, PhD (National Public Health Institute, Helsinki, Finland, technical assistance).
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online table 4
Footnotes
-
Co-investigators Dijon: M Giroud, MD (Stroke Registry of Dijon, University of Burgundy, University Hospital of Dijon, local PI); Sesto Fiorentino: D Inzitari, MD (Department of Neurological and Psychiatric Sciences, University of Florence; Institute of Neurosciences, Italian National Research Council, Florence, local PI); Menorca: M Torrent, MD (Area de Salud de Menorca, ib-salut, Menorca, local PI); Warsaw: H Sienkiewicz-Jarosz, MD, PhD (1st Department of Neurology, Institute of Psychiatry and Neurology, Warsaw, local PI); A Czlonkowska, PhD (2nd Department of Neurology, Institute of Psychiatry and Neurology, Warsaw, local PI). Data centre: C Sarti, PhD (National Public Health Institute, Helsinki, Finland, local PI).
-
Contributors CDAW conceived and designed the study and drafted the first version of the manuscript. He has been revising it critically for important intellectual content throughout the production process from early drafts and onwards. BC managed the data for the study, and contributed to the data interpretation, revision and critical review of the paper. SAA provided statistical analysis and interpretation of results, and drafted the second version of the manuscript. AGR and MSD contributed to the interpretation of data, and critically revised the paper for important intellectual content. All authors saw and approved the final version of the manuscript.
-
Funding The study was supported by the European Union Fifth Framework Programme and partially by the National Institute for Health Research (NIHR) Biomedical Research Centre based at Guy's and St Thomas' NHS Foundation Trust and King's College London, the Stanley Thomas Johnson Foundation, the Stroke Association and the NIHR Programme Grant funding (RP-PG-0407-10184) and DH HQIP funding. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
-
Competing interests None.
-
Patient consent Obtained.
-
Ethics approval The study was ethically approved by ethics committees of each of the centres involved.
-
Provenance and peer review Not commissioned; externally peer reviewed.
Linked Articles
- Editorial commentaries