Objective First, to determine the sensitivity and specificity of six stroke recognition scores in a single cohort to improve interscore comparability. Second, to test four stroke severity scores repurposed to recognise stroke in parallel.
Methods Of 9154 emergency runs, 689 consecutive cases of preclinically ‘suspected central nervous system disorder’ admitted to the emergency room (ER) of the Heidelberg University Hospital were included in the validation cohort. Using data abstracted from the neurological ER medical reports, retrospective assessment of stroke recognition scores became possible for the Cincinnati Prehospital Stroke Scale (CPSS), Face Arm Speech Test (FAST), Los Angeles Prehospital Stroke Screen (LAPSS), Melbourne Ambulance Stroke Screen (MASS), Medic Prehospital Assessment for Code Stroke (Med PACS) and Recognition of Stroke in the Emergency Room score (ROSIER), and that of stroke severity scores became possible for the Kurashiki Prehospital Stroke Scale (KPSS), Los Angeles Motor Scale (LAMS) and shortened National Institutes of Health Stroke Scale (sNIHSS)-8/sNIHSS-5. Test characteristics were calculated using the hospital discharge diagnosis as the reference standard.
Results The CPSS and FAST had a sensitivity of 83% (95% CI 76 to 88) and 85% (78% to 90%) and a specificity of 69% (64% to 73%) and 68% (63% to 72%), respectively. The more complex LAPSS, MASS and Med PACS had a high specificity (92% to 98%) but low sensitivity (44% to 71%). In the ROSIER, sensitivity (80%, 73 to 85) and specificity (79%, 75 to 83) were similar. Test characteristics for KPSS, sNIHSS-8 and sNIHSS-5 were similar to the simple recognition scores (sensitivity 83% to 86%, specificity 60% to 69%). The LAMS offered only low sensitivity.
Conclusions The simple CPSS and FAST scores provide good sensitivity for stroke recognition. More complex scores do not result in better diagnostic performance. Stroke severity scores can be repurposed to recognise stroke at the same time because test characteristics are comparable with pure stroke recognition scores. Particular shortcomings of the individual scores are discussed.
- CEREBROVASCULAR DISEASE
- CLINICAL NEUROLOGY
Statistics from Altmetric.com
Stroke should be recognised rapidly and accurately because treatment is time sensitive and any delays worsen patient outcomes.1–4 Therefore, several stroke recognition scores have been developed to improve the efficacy of early stroke identification.5–12 In addition, prehospital stroke severity assessment allows triage to comprehensive stroke centres for those patients who need intensive care or who qualify for advanced imaging and recanalisation therapies.
The available stroke recognition scores feature a huge variety of length and complexity, which complicates choosing the optimal score in the emergency setting. Furthermore, it is truly difficult to compare the reported diagnostic accuracies of recognition scores because of the large heterogeneity of previous validation cohorts, for example, in terms of stroke prevalence or whether comatose patients are included or excluded.13
Emergency severity scores were previously tested using a sequential approach that implies the use of a dedicated recognition score before assessing severity.9 ,14–16 To save time, to reduce training requirements and to enhance communication between the emergency medical services (EMS) and the hospitals, it would be desirable to use a single score to recognise stroke and grade severity in parallel.
The aims of our study therefore were (1) to provide a creator-independent validation of diagnostic test characteristics of six stroke recognition scores, and (2) to analyse the test characteristics of four emergency stroke severity scores repurposed for recognising stroke at the same time (table 1). Using the same validation cohort enhances comparability between scores.
Validation cohort and data collection
We identified all patients attended by EMS paramedics and emergency physicians between 1 November 2007 and 31 August 2010 (N=9154) using a prospective database (DATAPEC GmbH, Germany). In order to avoid ‘blind’ application of stroke recognition scores to all patients attended by the EMS, we selected consecutive cases allocated to the database category ‘suspected central nervous system (CNS) disorder’, that is, patients with potential stroke and stroke-mimics (N=778). Screening tools, that is, stroke recognition scores, were not yet in routine use during the study period. Patients with ‘suspected CNS disorder’ are admitted to the multidisciplinary emergency room (ER) of the Kopfklinik, Heidelberg University Hospital. Excluding repeated (N=50) and primary neurotrauma admissions (N=6) and cases with missing discharge diagnosis (N=33), the 689 remaining cases were included in the validation cohort (table 2).
Calculation of stroke scores
On the basis of uniform case report forms completed by the EMS (NADOKlive, DATAPEC GmbH, Germany), calculation of stroke scores was not feasible due to the lack of a standardised grading and reporting of neurological symptoms. We therefore created a pseudonymised database containing the medical history, as well as all symptoms and grading of symptoms as abstracted from the detailed neurological ER case report forms including the full scale National Institutes of Health Stroke Scale (NIHSS) and the premorbid modified Rankin Scale (pmRS). The latter ones were routinely obtained on admission by attending neurologists and validated retrospectively by certified NIHSS+mRS assessors (SP, FR and JCP).18 Utilising these data, we (SP and JCP) were able to calculate the individual stroke recognition and severity scores. Unless otherwise specified, a diagnosis of stroke was taken if one or more items were scored as ‘abnormal’ and, if applicable, when symptoms were unilateral.
Our diagnostic reference standard was the discharge diagnosis abstracted by AE and CH from the archived hospital discharge letters, considering the clinical symptoms and course, conclusive imaging and laboratory results, and further neurological diagnostic tests.19 AE and CH were masked for the results of the assessment of the individual stroke scores. As it is preclinically impossible to separate between transient ischaemic attack (TIA) with ongoing symptoms and stroke, we combined both diagnoses. Patients who turned out to have no stroke or TIA according to the thorough intrahospital diagnostic workup were classified to the ‘non-stroke’ category.
Stroke recognition scores
The Cincinnati Prehospital Stroke Scale (CPSS) assesses the (unilateral) presence of any (or all) of the following three items: facial droop, arm drift and speech.5 ,20 To assess speech, the patient is requested to repeat the sentence “The sky is blue in Cincinnati.” If the patient slurs or says the wrong words or is unable to speak, speech is rated as abnormal. As no validated version of the CPSS exists for Germany, we rated the speech item as ‘abnormal’ whenever dysphasia or dysarthria was present.
The Face Arm Speech Test (FAST) is based on the CPSS and contains items that measure unilateral facial droop (F) or arm drift (A) and speech problems (S); (T) means ‘time to call 911.’ In contrast to the CPSS, speech assessment is based on the entire conversation with the patient, not on repeating a given sentence. Assessment of ‘numbness’ (sensory loss in the upper limbs) was included in the FAST calculation, as proposed in the original version of the FAST score in 1999.21
The Los Angeles Prehospital Stroke Screen (LAPSS) comprises four historical questions, three motor function assessments and the evaluation of blood glucose.7 ,17 Questions address age (>45 years), absence of a history of epileptic seizures, symptom duration (<127 resp. <24 h17), whether the patient was ambulatory prior to the event, and blood glucose (target range 60–400 mg/dL). The motor function assessment examines unilateral weakness in facial grimace, handgrip and arm strength. Actually, two different versions of the LAPSS are published. Both versions are often referred to and used in parallel; differences have never been mentioned in the literature and implications for diagnostic efficiency are therefore unknown. Thus, we included both versions in our analysis: ‘LAPSS 1998’ refers to the originally published version of 1998 and ‘LAPSS 2000’ refers to the version published in 2000.7 ,17 In the LAPSS 1998, stroke is only considered when all questions are answered ‘yes.’7 In cases of questions answered ‘no’ or ‘unknown,’ stroke is scored as ‘unlikely,’ regardless of the motor function assessment. 7 In contrast, in the LAPSS 2000, ‘unknown’ in response to any historical item was set to equal ‘yes.’17 Furthermore, while in 1998 only patients with a symptom duration of less than 12 h met the stroke criteria, it was changed to 24 h in the LAPSS 2000.7 ,17
The Melbourne Ambulance Stroke Screen (MASS) combines all items on the LAPSS with the speech impairment item of the CPSS.10 The presence of any of the elements in the physical examination and affirmative answers on all the elements of the clinical history are required. Other than the LAPSS, no symptom duration limit is included, and the lower limit of blood glucose is 50 mg/dl (2.8 mmol/L) instead of 60 mg/dl (3.3 mmol/L).10
The Medic Prehospital Assessment for Code Stroke (Med PACS) assumes that no stroke is present, regardless of the motor deficits, whenever the patient has a history of seizures, symptom duration >25 h or blood glucose outside the range of 60–400 mg/dL. Physical examination includes face, gaze, arm, leg and speech items.12
The Recognition of Stroke in the Emergency Room (ROSIER) score was developed for recognising stroke in the ER and differentiates between stroke and stroke mimics. 11 It does not include any items with regard to medical history, but contains items concerning acute loss of consciousness and seizure activity. Blood glucose must be measured, and in case of hypoglycaemia, assessment is postponed until blood glucose has risen above 62 mg/dL (3.4 mmol/L). The physical assessment consists of an asymmetric weakness of the face, arm and leg, speech disturbances (dysphasia or dysarthria), and visual field defects. As loss of consciousness and seizure activity are scored as −1 (if present) or 0 (absent), the scale score can range between −2 and +5. A score ≤0 is defined as ‘unlikely stroke.’
Stroke severity scores
The Kurashiki Prehospital Stroke Scale (KPSS) was designed in 2008 to measure stroke severity after a recognition score had been applied.15 ,16 It consists of four assessments derived and modified from the NIHSS: assessments of the consciousness level and disturbance, motor weakness of the arms and legs and speech (dysphasia or dysarthria). Individual items add up to a maximum of 13 points.
The Los Angeles Motor Scale (LAMS) was specifically developed to assess stroke severity in the prehospital setting.14 Point values are assigned to each of the LAPSS motor items. While the LAPSS targets unilateral weakness only, the LAMS also addresses bilateral weakness, resulting in a total score of 0–10 in bilateral weakness and 0–5 in unilateral weakness.
The NIHSS is the most commonly used in-hospital acute stroke severity score. For prehospital assessment of stroke severity, shortened versions of the NIHSS have been developed, using 8 (sNIHSS-8) or 5 (sNIHSS-5) items of the NIHSS that were found to be most predictive of outcome at 3 months.9 The items on the sNIHSS-5 are gaze, visual field, motor function of the right and left leg, and language, with the sNIHSS-8 additionally assessing the level of consciousness, facial paresis and dysarthria.
The empirical distribution of continuous data is described as mean and SD; for categorical data, we calculated absolute and relative frequencies (count and percentage). Individual sensitivity (proportion of people with stroke who had a positive test; ie, indicative of the presence of stroke) and specificity (proportion of people who were free of stroke and had a negative test), positive predictive value (PPV) and negative predictive value (NPV) were calculated, with 95% CIs and likelihood ratio (LR) for each score. To ensure that the number of obtained cases (N=689) is sufficient, the sample size was calculated according to the method described by Buderer. 22 With reference to prior reports regarding stroke recognition scores published until 2011, a median sensitivity of 85% (range 44–96%) was found, as well as a median specificity of 83% (25–100%).5 ,7 ,10 ,11 ,17 ,23–29 With a prevalence of definite TIA/stroke of 29% in our population and maximum marginal error of estimate of 5% with 95% confidence level of the true value of sensitivity (or specificity), the total required sample size would have been 676 cases based on sensitivity (or 305 based on specificity). Subgroup analyses of patients with pmRS 0–2 versus 3–5 and age ≤65 versus >65 years were conducted. We performed sensitivity analyses assuming a 5% misallocation rate. For the ‘worst-case scenario,’ 5% of the true-negative (TN) patients (discharge diagnosis ‘nonstroke’ and score ‘nonstroke’) were switched to the false-negative (FN) (discharge diagnosis ‘stroke’ and score ‘non-stroke’) group, and 5% of the true-positive (TP) patients were allocated to the false-positive (FP) group. For calculation of the ‘best-case scenario,’ 5% of the FP patients were switched to the TP group, and 5% of the FN group was switched to the TN group. Bivariate correlation was calculated between the nonparametric absolute scores of the NIHSS, KPSS, LAMS, sNIHSS-8 and sNIHSS-5 using Spearman's rank correlation. Data handling and analysis, calculation of stroke scores, and graphic presentation were performed using SPSS (V.22, IBM, New York, USA) and GraphPad Prism (V.6b, San Diego, California, USA). This study was performed with reference to the STARD guidelines for studies on diagnostic tests.
In order to enhance comparability with previous literature data, a total of 49 (7.1%) comatose patients were excluded from the validation cohort used for the main analysis, but included in an online supplementary analysis of the whole cohort.
Stroke recognition scores
Sensitivity was highest for the FAST (85%, 95% CI 78 to 90) and CPSS (83%, 76 to 88). In contrast, specificity of the FAST (69%, 64 to 73) and CPSS (68%, 63 to 72) was low. The more complex LAPSS, MASS and Med PACS reached high specificities (92–98%), but these scores missed 29–56% of strokes. The ROSIER score combined a sensitivity of 80% (73% to 85%) with a specificity of 79% (75% to 83%; table 3 and figure 1).
Stroke severity scores
KPSS and sNIHSS-8 and sNIHSS-5, repurposed for stroke recognition, showed results (sensitivity 83–86%, specificity 60–69%) similar to the simple recognition scores. The sensitivity of the LAMS for stroke detection was low (67%, 60 to 74). All emergency severity scores showed excellent correlation of absolute scores with the in-hospital ‘gold standard’ for acute stroke severity grading, the NIHSS (Spearman's ρ=0.814–0.946, all p<0.0001; figure 2).
Subpopulations and sensitivity analysis
Diagnostic performance hardly changed when comatose patients were included in the analyses (see online supplementary table ST1). Comatose patients accounted for 25 strokes (6 ischaemic stroke, 19 intracranial haemorrhage).
With respect to the demographic change, subgroups were analyses based on patients’ age and degree of premorbid disability: In patients with moderate to severe premorbid disability (pmRS>2), the sensitivity increased by 2–15%, except for the two versions of the LAPSS and MASS, where a decrease of 24% to 35% was observed (table 4). On the other hand, specificity notably dropped by 29 to 41%, except in the two versions of the LAPSS, MASS and Med PACS, where it increased slightly by 1–7%. To test diagnostic accuracy in subgroups of ‘young’ versus ‘old’ patients, an age limit of 65 years was chosen based on the median age of the whole validation cohort (64.9 years, IQR 48.3–81.4). In patients aged >65 years, changes in sensitivity and specificity tended to follow the same patterns as in the subgroup of disabled persons, but were less prominent (table 5). Both versions of the LAPSS and MASS exclusively share an inherent lower age limit of 45 years, which is important to consider, as in our population five patients younger than 45 years actually had stroke and were missed by these scores.
In order to confirm the ‘robustness’ of our results, a sensitivity analysis was conducted (see online supplementary table ST2). Briefly, assuming a 5% misallocation rate, sensitivities decreased 6–9% in the ‘worst-case scenario’, while specificities decreased between 1% and 3%. In the ‘best-case scenario,’ sensitivities and specificities changed only slightly (0–2%).
The main study limitation is its retrospective study design, which however, was necessary to avoid delays in emergency situations in filling out various different stroke scales. Our analysis was based on the calculation of stroke scores using data obtained from the ER. The possible influences of different EMS systems, that is, the Anglo-American system with paramedics only versus the Franco-German system with paramedics and emergency physicians (usually anaesthetists without specific neurological training), are therefore largely excluded. Furthermore, the clear and easy-to-learn structure of the tested stroke recognition and severity scores makes influence of the setting or the examiners improbable. Selection of stroke scores included the most commonly used stroke recognition scores. We cannot rule out that some lesser-known scores were disregarded. The Ontario Prehospital Stroke Scale was excluded because data regarding the additional criteria for thrombolysis eligibility (eg, ‘can be transported within 2 h of onset’) were not available.27
Our results show that simple, structured and therefore easy-to-use stroke recognition scores such as the CPSS and FAST perform better for recognising stroke than more complex ones such as the LAPSS, MASS and Med PACS. Repurposing emergency stroke severity scores for recognising stroke in parallel resulted in good diagnostic performance, with the sNIHSS-8, sNIHSS-5 and the KPSS being the most suitable ones.
A large variability in reported sensitivities and specificities exists for all stroke recognition scores. Validation cohorts in the literature differ substantially, which is why it is difficult to compare scoring characteristics. Previous validation cohorts were often highly preselected, with a prevalence of confirmed stroke/TIA as high as 96% (mean 60.1%±18.7; see online supplementary table ST3).7 In our cohort, only 29% of the patients were actually stroke/TIA patients. A high prevalence of stroke results in an overestimation of PPVs. Recently, a systematic review including eight studies on seven stroke recognition scores underlined the heterogeneity of diagnostic characteristics and stressed the need for further investigation.13 Our approach of examining the scores’ capability to correctly identify patients with stroke in a defined, single cohort greatly enhances interscore comparability.
As stroke is a disabling and sometimes life-threatening condition, the sensitivity of stroke recognition scores should be high. On the other hand, overtriage may ‘flood’ neurological emergency departments with ‘potential’ patients with stroke, which might delay treatments in other patients, including actual patients with stroke. In our opinion, however, when diagnosis is uncertain, especially in preclinical emergency situations with specific limitations in diagnostic capacities, the policy ‘in doubt, pro stroke’ should be followed.
The CPSS and FAST showed good sensitivity and modest specificity. In view of their simple and easy-to-use structure, their currently broad use in emergency medicine is justified. Adding complexity by introducing history items and exclusion criteria increased specificity, but decreased sensitivity and inevitably prolongs assessment by the EMS.13 Therefore, the use of both versions of the LAPSS, MASS and Med PACS as acute stroke recognition scores is clearly limited.
In the LAPSS and MASS, younger age (≤45 years) leads to the assumption that the diagnosis of stroke is unlikely. As the incidence of stroke in the young is rising,30 it is neither ethically nor scientifically justified, in our opinion, to include an age limit in a stroke recognition score. In our population, two ischaemic and three haemorrhagic strokes would have been missed due to the age limit. Recently, in a large prospective Chinese survey, an false-negative diagnosis was made in 21% of LAPSS-screened patients with stroke due to the age limit alone.31 Another peculiar point with the LAPSS, MASS and Med PACS is that patients with a medical history of epilepsy are excluded. Since the incidence of post-stroke epilepsy is high, and patients with stroke suffer from a generally higher risk of recurrent stroke than the normal population,32 epilepsy should not be an a priori exclusion criterion for stroke, particularly not in the preclinical setting where diagnostic capacities are limited.
In contrast to the LAPSS, MASS and Med PACS, the ROSIER score, primarily developed for use in the ER, follows another approach by attributing points to its single items, enabling one to subtract one point each if acute seizure activity is observed or if consciousness is lost or syncope occurs. Therefore, stroke is less likely per score but not excluded. Although the ROSIER score is a well-designed score, with a moderate to good sensitivity and adequate specificity, its use is limited preclinically due to the more complex structure compared to the CPSS and FAST, which perform similarly but are simpler.33 ,34
We observed a significant drop in the specificity of nearly all scores, except the LAPSS, MASS and Med PACS, when the scores were applied to patients with pre-existing disabilities and—less extensively—to old patients (>65 years). This finding is important because it underlines the necessity for having access to a documented description of the prior clinical status in emergency situations, for example, through electronic health records, whenever patients or relatives cannot provide this information in a timely manner.
The greatest differences between our data and previously reported values for sensitivity and specificity were found for the LAPSS and MASS. In the original validation cohort, as well as in the second published version of the LAPSS, notably higher sensitivities and specificities were reported,7 ,17 although they have never been confirmed.10 ,11 ,31 ,35 By using a modified LAPSS and removing the age restriction, Wojner-Alexandrov reported an astonishingly high sensitivity of 95% and specificity of 98% of stroke recognition for the modified LAPSS in conjunction with further training.36 That study already found an 86% sensitivity and 99% specificity for paramedics to identify stroke before using the (modified) LAPSS.36 Owing to the use of a modified LAPSS, we excluded this trial from our literature overview. Differences between the original validation cohorts and those in subsequent studies might be at least partly explained by the distinct training of paramedics and physicians, differences in preselecting patients and differences in the rate of patients with pre-existing disabilities.
Emergency stroke severity scores might be repurposed for recognising stroke and grading severity in parallel by using only one score in order to save ‘time and brain’. We therefore provide a comparative validation in a setting with purely stroke recognition scores. The capacity of the sNIHSS-8/sNIHSS-5 and the KPSS for recognising stroke was very similar to that of the CPSS and FAST, whereas the latter ones do not assess severity. Hence, parallel stroke recognition and severity grading is feasible and permits avoiding the time consuming use of two separate scores by the EMS. As the LAMS lacked sensitivity, its use is limited to severity assessment alone. The sNIHSS-8 and sNIHSS-5 share the advantage that the particular items are evaluated in accordance with the full NIHSS; therefore, the values of all shared items are directly comparable during all steps of emergency stroke care from the EMS through to the ER to inpatient treatment. In contrast, although also derived from the NIHSS, the KPSS uses a modified item assessment.
Easy to use, short stroke recognition scores, such as the CPSS and FAST, provide good sensitivity for recognising stroke. Low sensitivity and non-recognition of young patients with stroke by the lower age limit in the more complex LAPSS and MASS raises serious concerns. In view of the fact that preclinical assessment of stroke severity is becoming more important in order to directly triage severely affected patients to comprehensive stroke centres, the use of combined stroke severity and recognition scores is desirable. The diagnostic capacities of existing emergency severity scores such as the sNIHSS-8, sNIHSS-5 and KPSS are good, like those of the simple stroke recognition scores.
The authors would like to thank Francesca Russo for her assistance in data validation.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online supplement
Contributors SP and JCP conceived and designed the study. EP provided the EMS data. SP, JCP, AE and CH collected and analysed the data. TB provided statistical advice. JCP and SP drafted the article and all authors contributed to its revision. SP and JCP take the responsibility for the paper as a whole.
Competing interests None.
Ethics approval Ethics Committee, Medical Faculty of Heidelberg University, Heidelberg, Germany.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.