Article Text
Abstract
Objectives: To evaluate the reliability and validity of the Short Parkinson’s Evaluation Scale (SPES)/SCales for Outcomes in Parkinson’s disease (SCOPA)—a short scale developed to assess motor function in patients with Parkinson’s disease (PD).
Methods: Eighty five patients with PD were assessed with the SPES/SCOPA, Unified Parkinson’s Disease Rating Scale (UPDRS), Hoehn and Yahr (H&Y) scale, and Schwab and England (S&E) scale. Thirty four patients were examined twice by two different assessors who were blinded to each other’s scores and test executions. Additionally, six items of the motor section of the SPES/SCOPA were assessed in nine patients and recorded on videotape to evaluate inter-rater and intra-rater reliability.
Results: The reproducibility of the sum scores in the clinical assessments was high for all subscales of the SPES/SCOPA. Inter-rater reliability coefficients for individual items ranged from 0.27–0.83 in the motor impairment section, from 0.58–0.82 in the activities of daily living section, and from 0.65–0.92 in the motor complications section. Inter-rater reliability of the motor items in the video assessments ranged from 0.70–0.87 and intra-rater reliability ranged from 0.81–0.95. The correlation between related subscales of the SPES/SCOPA and UPDRS were all higher than 0.85, and both scales revealed similar correlations with other measures of disease severity. The mean time to complete the scales differed significantly (p<0.001) and measured 8.1 (SD 1.9) minutes for the SPES/SCOPA and 15.6 (SD 3.6) minutes for the UPDRS.
Conclusion: The SPES/SCOPA is a short, reliable, and valid scale that can adequately be used in both research and clinical practice.
- Parkinson’s disease
- assessment
- motor function
- SPES/SCOPA
- ADL, activities of daily living
- ANOVA, analysis of variance
- H&Y, Hoehn and Yahr
- ICC, intra-class correlation coefficient
- MI, motor impairment
- PD, Parkinson’s disease
- S&E, Schwab and England
- SCOPA, SCales for Outcomes in Parkinson’s disease
- SPES, Short Parkinson’s Evaluation Scale
- UPDRS, Unified Parkinson’s Disease Rating Scale
Statistics from Altmetric.com
- ADL, activities of daily living
- ANOVA, analysis of variance
- H&Y, Hoehn and Yahr
- ICC, intra-class correlation coefficient
- MI, motor impairment
- PD, Parkinson’s disease
- S&E, Schwab and England
- SCOPA, SCales for Outcomes in Parkinson’s disease
- SPES, Short Parkinson’s Evaluation Scale
- UPDRS, Unified Parkinson’s Disease Rating Scale
Over the past 15 years the Unified Parkinson’s Disease Rating Scale (UPDRS) has become a standard tool in the clinical evaluation of patients with Parkinson’s disease (PD). The UPDRS is the most frequently used scale in PD trials1 and has acceptable inter-rater and intra-rater reliability for most items.2,3 The construct validity with other scales is adequate but the content validity has been questioned, especially with respect to its conceptual clearness and balance between items that represent symptoms responsive to dopaminergic treatment and those more resistant to this intervention.2 Other critiques include the length of the scale4 and the redundancy of items.5 The mean time to complete the scale is about 17 minutes for experienced users,4 which makes it less suitable for clinical application. Van Hilten et al5 demonstrated that the UPDRS can be shortened by removing redundant items from the motor section and conceptually unclear items from the activities of daily living (ADL) section without negative consequences for reliability or validity. A shorter scale with similar clinimetric properties may have advantages for patients, clinicians, and researchers.
As a result of the aforementioned considerations, the Short Parkinson’s Evaluation Scale (SPES) was developed.6 This scale is short, conceptually clear, and displays good reliability and validity.2,6 The instrument is considered easy to use by its evaluators6 but has only been used in a few studies.7–10
Careful inspection of the SPES, however, indicates that the consistency in the framing of response options may be improved. The item “swallowing” represents an impairment and should be moved from the disability (ADL) section to the motor impairment (MI) section in order to be consistent with current methodological concepts of scale construction.11,12 Additionally, some clinimetric aspects of the SPES have not been addressed to date. These aspects involve intra-rater and inter-rater reliability between two assessors who perform the clinical assessments separately. Hence, we first modified the SPES according to the aforementioned considerations and subsequently evaluated this scale the SPES/SCOPA. The development of this scale is part of a larger research project on SCales for Outcomes in Parkinson’s disease (SCOPA)13 in which short, practical, and clinimetric sound scales for all relevant domains in PD are selected or developed.
The objective of this study was to evaluate the reliability (intra-rater and inter-rater reliability, internal consistency) and construct validity (correlation with related scales, “known groups” comparisons) of the SPES/SCOPA.
METHODS
Development of the SPES/SCOPA
The SPES/SCOPA consists of three sections: MI, ADL, and motor complications. There are four response options ranging from 0 (normal) to 3 (severe). In comparison with the original SPES, some modifications to the three sections were made based on findings in the literature and empirical testing of some of the items. The mental section was removed altogether because we felt that these important functions could not be assessed in a reliable and valid way by a few single questions; as a part of the SCOPA project, we tested and developed separate instruments for these functions.13
Motor impairments
Tremor. Pooled data of several studies8,14–17 (total 1361 patients) revealed that in less than 4% of the patients the tremor score of the legs was higher than that of the arms, and that in only 2% of the patients a tremor in the legs was present whereas it was absent in the arms. For reasons of efficiency we therefore decided to evaluate tremor in the upper extremities only. Additionally, in the response options we linked the amplitude of the tremor to a displacement in centimetres to improve the quantification of “small”, “moderate”, and “severe”. Bradykinesia. “Finger tapping” was replaced by “rapid alternating movements” on the basis of the results of a separate study in which we compared five different tests for bradykinesia.18 This study revealed that this test had the highest intra-rater and inter-rater reliability and the highest correlation with measures of disease severity. Both tremor and bradykinesia were assessed during 20 seconds. A time window was not indicated in the original SPES. Rigidity. The phrase “detectable only on activation of the contralateral arm” was removed from the response options because a pilot study showed that the muscle tone in the ipsilateral arm also increased in many healthy individuals if the contralateral arm was raised. Rigidity was now evaluated by “perceived difficulty in trying to reach the end positions in elbow or wrist”, which proved to be a useful criterion in a pilot study we held in 17 patients with PD. Postural stability. This item was modified on the basis of a study in which we compared six tests for postural stability.19 The word “retropulsion” was removed from the response options, and the scoring was now determined by the number of steps patients took to restore balance and by whether a patient would fall or not. Arising from chair, gait, and speech. These items only underwent minimal changes compared with the original SPES and do not need detailed discussion. Two other impairments were added and evaluated historically— namely, swallowing and freezing during “on”. We considered it more important to assign items to the appropriate section than to assign items to sections on the basis of the way they are elicited (examination v history). “Swallowing” was therefore moved from the ADL section to the MI section. “Freezing during on” was added because it was considered a useful progression marker of PD that is less responsive to dopaminergic interventions. The maximum score in the MI section is 42.
Activities of daily living
The response options were framed as uniformly as possible. Responses reflected no difficulty (normal), some difficulty (no assistance needed), considerable difficulty (possibly needing assistance), and unable (or needing complete or almost complete assistance). As stated before, “swallowing” (previously addressed under “eating”) was removed. “Turning and getting out of bed” was extended to “changing positions” to include all important transfers of daily life. The maximum score in this section is 21.
Motor complications
The original section on complications of treatment was modified and now evaluated both the presence and severity of dyskinesias and motor fluctuations. Freezing and dystonia were removed because these phenomena cannot solely be attributed to dopaminergic treatment and, hence, may not reflect complications of treatment. Additionally, in patients treated with levodopa, dystonia may emerge while patients are “on” or “off” or during the switch of one state to the other and coincide with other dyskinesias. In view of this complexity and the fact that “off” and dyskinesias are already assessed, we decided not to evaluate dystonia separately. Items were framed to address the impairment level. The maximum score is 6 for the dyskinesia section and 6 for the motor fluctuations section.
Patients
Eighty five consecutive, non-demented patients who visited the outpatient clinic of the Department of Neurology of the Leiden University Medical Center, and who fulfilled the United Kingdom Parkinson’s Disease Society Brain Bank criteria for idiopathic PD,20 were included in the clinical assessments. Patients were excluded if they also had other diseases of the central nervous system or were not able to understand Dutch. Nine patients that met the same eligibility criteria were included in video assessments. The study was approved by the medical ethics committee of the Leiden University Medical Center.
Assessment procedures
The patients were evaluated with the SPES/SCOPA (appendix), UPDRS parts II–IV,21 the Hoehn and Yahr (H&Y) scale,22 and the Schwab and England (S&E) scale.23 The latter three scales were included to evaluate the construct validity of the SPES/SCOPA. One global question evaluated overall ADL functioning and patients were asked to indicate on a seven-point scale (ranging from very good to very bad) how well they were able to carry out various daily activities in the past month. Patients also indicated whether they were “on” or “off” at the time of assessment. Additional information gathered included medication, disease duration, and comorbidity.
Reproducibility of the SPES/SCOPA was assessed in three different ways: inter-rater reliability of clinical assessments, inter-rater reliability of video assessments, and intra-rater reliability of video assessments. In the clinical assessments, 34 patients were assessed twice by the investigators (JM, MV) who assessed patients separately in immediate succession and were blind to each other’s scores and test executions. Reproducibility was calculated in patients with a stable response to medication during this period—that is, who had no “on–off” transitions. In the video assessments, inter-rater reliability was assessed by an international panel of movement disorders specialists (from Italy, Israel, Spain, Germany, and the Netherlands) who rated nine videotaped patients with a stable response to medication during the time the recordings were made. This panel rated the videotaped patients twice with an interval of 7–14 days. In the video recordings, six items from the MI section were presented. The other four items of this section were not appropriate for video scoring— that is, rigidity, speech, and the two historic items. Rigidity could not be evaluated from video for obvious reasons, and the other three items were not included because patients and raters spoke different languages.
Statistical analysis
Reproducibility was assessed with the intra-class correlation coefficient (ICC; one-way random effects model) if two ratings were compared (inter-rater reliability in clinical assessments and intra-rater reliability in video assessments), and with Kendall’s coefficient of concordance W if it concerned agreement between more than two raters (inter-rater reliability in video assessments). The ICC is equivalent to the weighted κ if quadratic weights are used.24 We therefore used the “strength-of-agreement” classification as proposed by Landis and Koch,25 who classified strength of agreement as slight (0.00–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), or almost perfect (0.81–1.00). Internal consistency of subscales was evaluated with Cronbach’s α. Construct validity was evaluated by determining the correlation between scales, using Pearson’s r for the correlation of SPES/SCOPA with UPDRS and S&E, and the Spearman’s test rs for the correlation with H&Y and “global functioning”. “Known groups” validity was assessed by comparing scores of patients with different disease severity—that is, by H&Y—using analysis of variance (ANOVA) and ordinal regression.
RESULTS
The characteristics of patients that participated in the clinical assessment are presented in table 1. The five men and four women recorded on video had a mean age of 62.4 years and a mean disease duration of 12.4 years. Three of these patients were in H&Y stage 2, three in H&Y 3, and three in H&Y 4 (data not shown).
Practicality
The mean time necessary to complete the scales was 8.1 (SD 1.9) minutes for the SPES/SCOPA and 15.6 (SD 3.6) minutes for the UPDRS. This difference was significant (p<0.001).
Reliability of clinical assessments
One patient changed from “on” to “off” between both clinical assessments and was removed from the analysis; hence, inter-rater reliability was assessed in 33 patients. The inter-rater reliability coefficients for the motor sections of the clinical assessments were all at least “moderate” according to the Landis & Koch criteria, except for two items in the SPES/SCOPA (“postural tremor right hand”, “rigidity right arm”) and eight items in the UPDRS (table 2). The latter eight items also included the two items that had “fair” reliability in the SPES/SCOPA. The mean reliability coefficient, calculated over the items that were shared by both scales, was 0.56 for both. The mean ICC calculated over all the items of the motor sections was 0.58 for the SPES/SCOPA and 0.50 for the UPDRS. Two items in the UPDRS (rest tremor left and right leg) could not reliably be calculated as a result of insufficient dispersion; however, percentage agreement for these items was high (table 2). Results for the ADL section are presented in table 3. All agreements were at least “substantial” except “changing positions” in the SPES/SCOPA. The mean reliability coefficient calculated over the shared items of the ADL sections was 0.69 for the SPES/SCOPA and 0.71 for the UPDRS. The mean ICC over all items of the ADL section of the UPDRS was 0.76. The motor complication sections of both scales (table 4) shared only one item in both the dyskinesia and the motor fluctuation section. The mean ICC for the items in the dyskinesia section was 0.83 for the SPES/SCOPA and 0.75 for the UPDRS. The mean ICC for items in the motor fluctuation sections was 0.67 for the SPES/SCOPA and 0.60 for the UPDRS. The ICCs of the sumscores of the UPDRS were generally higher than those of the SPES/SCOPA (table 5).
Reliability of video assessments
The inter-rater reliability coefficients for the items in the video assessments were 0.70 or higher and therefore all at least “substantial” (table 6). Intra-rater reliability coefficients were higher, with all items above 0.80 (“almost perfect”).
Internal consistency
The internal consistencies of the SPES/SCOPA scales were higher than those of the UPDRS except for the MI scale (table 5). “Sensory symptoms” in the UPDRS-ADL had a negative corrected item-total correlation (−0.02). Other items with corrected correlations below 0.20 involved rest tremor of the right hand (0.15) and swallowing (0.18) in the SPES/SCOPA; rest tremor of head, right hand, left hand, and right leg (0.03, 0.11, 0.16, and 0.18, respectively) in the UPDRS motor section; and tremor (0.05) in the UPDRS-ADL section. The corrected item-total correlation of the tremor of the left hand was considerably higher in both scales (0.40 in the SPES/SCOPA and 0.31 in the UPDRS).
Validity
Correlations between related sections of the SPES/SCOPA and the UPDRS were 0.88 for MI, 0.86 for ADL, 0.86 for dyskinesias, and 0.95 for motor fluctuations. The correlations of these sections with the H&Y and S&E scale were all similar: all differences in correlations in a range between 0.02 and 0.10. The correlation between these sections and disease duration also bore strong resemblance, with coefficients of 0.38 (SPES/SCOPA) and 0.23 (UPDRS) for the motor sections, and 0.29 (SPES/SCOPA) and 0.36 (UPDRS) for the ADL sections. The correlation with global ADL functioning was 0.49 for the SPES/SCOPA ADL and 0.48 for the UPDRS ADL.
Mean scores of patients grouped by their H&Y stages (table 7) indicated significant differences between groups for both the motor and ADL sections of both scales (ANOVA; all p values <0.001). Post-hoc t tests showed no significant differences between patients in H&Y 2 and H&Y 3 in both scales, but differences between stages 2 and 4, and between stages 3 and 4, were significant. A significant trend was present in both sections of the scales, with higher scores for patients with more advanced PD.
DISCUSSION
We evaluated several aspects of the clinimetric performance of the SPES/SCOPA. Inter-rater reliability and internal consistency obtained by clinical assessment among 85 patients were similar for the SPES/SCOPA and the UPDRS. The reproducibility of the sumscores of the SPES/SCOPA was high. Two items in the motor section displayed less than moderate agreement—that is, both “postural tremor” and “rigidity” of the right arm. The same items on the same side also performed only fairly in the UPDRS, and previous studies on the UPDRS have also found lower scores for these items.4,26,27 Inter-rater reliability of the SPES/SCOPA MI section assessed from video displayed higher scores than the clinical assessments, with all values above 0.70. Intra-rater reliability is even higher, with all reliability coefficients over 0.80. Items in the ADL section were only evaluated in the clinical assessment situation and all displayed at least “substantial agreement”, with the exception of “changing positions”. Items in the motor complication section all showed “substantial” reproducibility. The results from our study comply with previous findings in which the SPES and UPDRS were compared6 and comparable items displayed similar reliability. The internal consistency of all SPES/SCOPA scales was above 0.70, which is considered the minimum for group comparisons.28
The correlation between related SPES/SCOPA and UPDRS sections is high, which is not too surprising given the shared components of both scales. Additionally, the similarity of the correlations between the SPES/SCOPA and UPDRS compared with measures of disease severity, such as H&Y, S&E, global ADL functioning, and disease duration, is striking. This endorses the impression that the scales capture the same phenomena. Differences between patients grouped by their H&Y stages also display similar results for both scales. Although the SPES/SCOPA contains only half the number of items and has four response options instead of five, reliability and validity are apparently preserved.
Inter-rater reliability in our study was generally lower than that seen in other studies. This is not surprising given that most previous studies used either video recordings or a design where several raters assessed patients simultaneously, therewith excluding potential biases caused by changes in the patient’s state and differences in test executions. Video assessments are useful because they provide information on reproducibility in standardised situations and thus present the opportunity to locate weaker items that may benefit from clearer instructions and descriptions. However, knowledge on the degree of reliability if patients are assessed at separate occasions, either by the same or different assessors, provides additional information because it reflects the routine of studies and clinical practice. To the best of our knowledge, only one study assessed the reproducibility of the UPDRS over separate clinical examinations.3 Because this study evaluated intra-rater reliability over a 2-week interval, whereas we have assessed inter-rater reliability in immediate succession, the different designs of the studies do not allow a direct comparison.
Future studies that aim to improve the performance of the items with lower reliability (postural tremor right, rapid alternating movements left, rigidity right) are recommended. Postural tremor and rapid alternating movements may benefit from a shorter time window. The longer the assessment time the greater chance that the performance becomes inconsistent (“destabilises”). Another point that may improve the reliability is to state more clearly that the response option that best reflects the patient’s performance should be scored. Some raters may be in doubt whether to rate the best, worst, or mean performance. A shorter assessment period and clearer instructions may be helpful. Whether these considerations will indeed be helpful must first be assessed. Special attention must be paid to whether an improvement in reliability is not obtained at the expense of validity, as some problems typically become apparent after longer performance.
Improving the item on rigidity will probably be difficult. One possibility is to assess whether higher reliability can be obtained if rigidity is assessed separately for either the wrist or the elbow instead of a combination of both.
It is unfortunate that we did not include more patients in the more extreme stages of disease severity—that is, H&Y stages 1 and 5. It is difficult to indicate whether our results can be extrapolated to patients in the end ranges, but it is usually more difficult to rate the intermediate states (mildly, moderately affected) than the more extreme ones (normal, severely affected). More variation in the severity of the patients would probably have resulted in larger differences between individuals and, hence, in a larger between-subject variance. As the ICC is calculated as the ratio of the variance of true differences between individuals to the total variance this probably would have resulted in a higher ICC, which would indicate that our findings probably rather underestimate than overestimate the results.
To summarise, the SPES/SCOPA is a reliable, valid, and conceptually clear scale that is completed in half the time it takes to administer the UPDRS. These advantages may favour the use of the SPES/SCOPA in evaluating motor function in patients with PD.
APPENDIX – SPES/SCOPA SCALE
A. MOTOR EVALUATION
Clinical examination
-
Rest tremor
Assess each arm separately during 20 seconds; hands rest on thighs; if tremor is not evident at rest, try to keep the patient attentive—for example, by having them count backwards with eyes closed
0 = absent
1 = small amplitude (<1 cm) occurring spontaneously, or obtained only while keeping patient attentive (any amplitude)
2 = moderate amplitude (1–4 cm), occurring spontaneously
3 = large amplitude (⩾4 cm), occurring spontaneously.
-
Postural tremor
Check with arms outstretched, pronated and semipronated, and with index fingers of both hands almost touching each other (elbows flexed); assess each position during 20 seconds
0 = absent
1 = small amplitude (<1 cm)
2 = moderate amplitude (1–4 cm)
3 = large amplitude (⩾4 cm).
-
Rapid alternating movements of hands
Rapid alternating pronation/supination movements of upper hand, each time slapping the palm of the horizontally held lower hand during 20 seconds; each hand separately
0 = normal
1 = slow execution, or mild slowing and/or reduction in amplitude; may have occasional arrests
2 = moderate slowing and/or reduction in amplitude or hesitations in initiating movements or frequent arrests in ongoing movements
3 = can barely perform task.
-
Rigidity
Assess passive movements of elbow and wrist over full range, with the patient relaxed in sitting position; ignore cogwheeling; check each arm separately
0 = absent
1 = mild rigidity over full range, no difficulty reaching end positions
2 = moderate rigidity, some difficulties reaching end positions
3 = severe rigidity, considerable difficulties reaching end positions.
-
Rise from chair
Patient is instructed to fold arms across chest; use straight back chair
0 = normal
1 = slowly; does not need arms to get up
2 = needs arms to get up (can get up without help)
3 = unable to rise (without help).
-
Postural stability
Stand behind the patient and pull patient backwards, while patient is standing erect with eyes open and feet spaced slightly apart; patient is not prepared
0 = normal, may take up to two steps to recover
1 = takes three or more steps; recovers unaided
2 = would fall if not caught
3 = spontaneous tendency to fall or unable to stand unaided.
-
Gait
Assess gait pattern; use walking aid or offer assistance, if necessary
0 = normal
1 = mild slowing and/or reduction of step height or length; does not shuffle
2 = severe slowing, or shuffles, or has festination
3 = unable to walk.
-
Speech
0 = normal
1 = slight loss of expression, diction, and/or volume
2 = slurred; not always intelligible
3 = unintelligible always or most of the time.
Historical information
9. Freezing during “on”
Freezing is characterised by hesitation when trying to start walking or being “glued” to the ground while walking
0 = absent
1 = start hesitation only, occasionally present
2 = frequently present, may have freezing when walking
3 = severe freezing when walking.
10. Swallowing
0 = normal
1 = some difficulty or slow; does not choke; normal diet
2 = sometimes chokes; may require soft food
3 = chokes frequently; may require soft food or alternative method of food intake.
B. ACTIVITIES OF DAILY LIVING
11. Speech
0 = normal
1 = some difficulty; may sometimes be asked to repeat sentences
2 = considerable difficulty; frequently asked to repeat sentences
3 = unintelligible most of the time.
12. Feeding (cutting, filling cup, etc.)
0 = normal
1 = some difficulty or slow; does not need assistance
2 = considerable difficulty; may need some assistance
3 = needs almost complete or complete assistance.
13. Dressing
0 = normal
1 = some difficulty or slow; does not need assistance
2 = considerable difficulty; may need some assistance—for instance, buttoning, getting arms into sleeves
3 = needs almost complete or complete assistance.
14. Hygiene (washing, combing hair, shaving, brushing teeth, using toilet)
0 = normal
1 = some difficulty or slow; does not need assistance
2 = considerable difficulty; may need some assistance
3 = needs almost complete or complete assistance.
15. Changing position (turning over in bed, getting up out of bed, getting up out of a chair, turning around when standing)
0 = normal
1 = some difficulty or slow; does not need assistance with any change of position
2 = considerable difficulty; may need assistance with one or more changes of position
3 = needs almost complete or complete assistance with one or more changes of position.
16. Walking
0 = normal
1 = some difficulty or slow; does not need assistance or walking aid
2 = considerable difficulty; may need assistance or walking aid
3 = unable to walk, or walks only with assistance and great effort.
17. Handwriting
0 = normal
1 = some difficulty—for instance, slow, small letters; all words legible
2 = considerable difficulty; not all words legible; may need to use block letters
3 = majority of words are illegible.
C. MOTOR COMPLICATIONS
18. Dyskinesias (presence)
0 = absent
1 = present some of the time
2 = present a considerable part of the time
3 = present most or all of the time.
19. Dyskinesias (severity)
0 = absent
1 = small amplitude
2 = moderate amplitude
3 = large amplitude.
20. Motor fluctuations (presence of “off” periods)
What proportion of the waking day is patient “off” on average?
0 = none
1 = some of the time
2 = a considerable part of the time
3 = most or all of the time.
21. Motor fluctuations (severity of “off” periods)
0 = absent
1 = mild end-of-dose fluctuations
2 = moderate end-of-dose fluctuations; unpredictable fluctuations may occur occasionally
3 = severe end-of-dose fluctuations; unpredictable on–off oscillations occur frequently.
Acknowledgments
Professor RAC Roos is gratefully acknowledged for reviewing the manuscript. The authors thank the following persons for their participation in evaluating the videotapes: Dr S Agostini, Dr S Bernardini, Dr C Berti, Dr D Caneparo, Dr E Cubo, Dr G Gambaccini, Dr C Klein, Dr T van Laar, Dr C Lucetti, and Dr T Prokhorov.
REFERENCES
Footnotes
-
This study was financially supported by The Netherlands Organization for Scientific Research (NWO; project no. 0940-33-021).
-
Competing interests: none declared