Article Text

Download PDFPDF

Disability outcome measures in therapeutic trials of relapsing-remitting multiple sclerosis: effects of heterogeneity of disease course in placebo cohorts
  1. Clarence Liu,
  2. Lance D Blumhardt
  1. Division of Clinical Neurology, Department of Medicine, University Hospital, Queen's Medical Centre, Nottingham, UK
  1. Professor LD Blumhardt, Division of Clinical Neurology, Department of Medicine, University Hospital, Queen's Medical Centre, Nottingham NG7 2UH, UK


OBJECTIVES Recent phase III clinical trials of immunomodulatory therapies in relapsing-remitting multiple sclerosis have shown significant benefits of active treatment on relapse related end points, but effects on disability outcomes have been inconsistent. These apparent discrepancies could be due to differences in the clinical end points employed, the behaviour of placebo cohorts, or both.

METHODS Disability data from the placebo cohorts of two large phase III studies, the United States glatiramer acetate trial (Copolymer 1 Multiple Sclerosis Study Group) and the multinational interferon β-1a trial (PRISMS Study Group) were combined and masked (n=313). Two groups of disability outcome measures were assessed. Firstly, measures of disability change (2 year EDSS difference and area under the EDSS/time curve, AUC) were calculated. Secondly, conventional disease progression end points (“confirmed progression” and “worsening to EDSS 6.0”) were evaluated by using Kaplan-Meier analysis and compared with a categorical classification based on EDSS trends.

RESULTS The average increase in disability for the entire cohort as assessed by mean 2 year EDSS change (<0.5 EDSS point) or mean AUC (+0.57 EDSS-years) was small. For the “confirmed progression” end points, increasing the stringency of the definition lowered their incidence (from 32% with 1.0 point at 3 months, to 9% with 2.0 points at 6 months), but did not improve the positive predictive accuracy for “sustained progression” maintained to the end of the study. The error rate for this outcome was about 50%. Worsening to EDSS 6.0 was a more reliable end point, but had even lower sensitivity (incidence <10%). EDSS trend analysis showed markedly heterogeneous disease courses, which were then categorised into “stable” (26%), “relapsing-remitting” (59%), and “progressive” (15%) courses. Patients with the last course had deteriorated considerably by the end of 2 years (mean worsening of 2.0 EDSS points).

CONCLUSION In relapsing-remitting multiple sclerosis treatment trials, the conventional measure of mean EDSS change has low sensitivity, whereas the widely applied confirmed progression end points have high error rates regardless of their definition stringency. Alternative methods with better data utilisation include AUC summary measures and categorical disease trend analysis. The heterogeneity of disability outcomes in short trials, combined with unreliable clinical end points, diminishes the credibility of therapeutic claims aimed at reducing irreversible neurological deficits. The behaviour of patients treated with placebo should be carefully analysed before conclusions are drawn on the efficacy of putative treatments.

  • multiple sclerosis
  • disability outcome measures
  • treatment trials

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Recent double blind, placebo controlled phase III therapeutic trials in relapsing-remitting multiple sclerosis have consistently demonstrated the efficacy of immunomodulatory drugs for reducing disease activity.1-5 Less well studied are the severity of transient symptoms and the effects on progression of disability.6-8 Data from relatively short trials of relapsing-remitting multiple sclerosis are undoubtedly difficult to analyse and interpret, as many clinical exacerbations, although disabling, may be reversible and the accumulation of fixed neurological dysfunction across a representative cohort may take many years.9 10 Evidence that new therapies preventirreversible worsening of disability is less consistent.11 12 One example is the contrast between the North American interferon β-1b and intramuscular interferon β-1a studies; the first showed no effects on disability,1 13whereas the second demonstrated apparent treatment benefits in reducing disease progression.3 14 Although this may be due to genuine differences in the effects of these agents, interpretation is confounded by factors such as variance in the outcome measures employed and fluctuations in the behaviour of the placebo cohorts.

We attempted to address these issues by examining disability data, as measured by the expanded disability status scale (EDSS),15from the placebo arms of two randomised, double blind, 2 year therapeutic trials: the United States study of copolymer 1 (or glatiramer acetate, Copaxone ®), investigated by the Copolymer 1 Multiple Sclerosis Study Group2 and the multinational study of interferon β-1a (Rebif ®), investigated by the PRISMS Study Group.5 Both trials had similar baseline enrolment characteristics and were well conducted, especially with respect to their low dropout rates and adequate treatment blinding. We were thus able to evaluate in-study disability of the patients treated with placebo with two methods which quantify disability change (the “2 year EDSS difference” from trial entry to completion and the summary measure “area under the disability/time curve” (AUC)),16 and three measures which assess disease progression (“confirmed EDSS progression”, “worsening to EDSS 6.0”, and “categorical EDSS trend analysis”).


The enrolment criteria and trial designs of the original phase III studies have been previously published.2 5 The baseline demographics of the patients in the placebo arms of the two trials were indistinguishable (table 1), enabling all EDSS data to be masked and incorporated into a single database. Due to the non-linearity and varying staying times at different levels on the EDSS scale,17 a worsening of 0.5 point between EDSS of 5.5 and 7.0 was adjusted to be equivalent to 1.0 point.18 19

Table 1

Baseline characteristics of placebo patients in two phase III trials


Two year EDSS difference

Conventional EDSS change over the 2 year trial was obtained from the difference between the initial baseline and final (2 year) EDSS scores.

AUC analysis

A summary measure statistic was derived by integrating the area under the plotted EDSS/time curve (AUC) normalised to entry disability level,16 to provide an index of in-trial morbidity change. Analysis was performed on data acquired solely from scheduled visits to avoid possible bias due to differential sampling rates between treatment arms and, to some extent, the effects of transient relapse related disability.


Confirmed progression

This end point, which is widely used in clinical trials to assess therapeutic effects, is often interpreted as an indicator of irreversible worsening of disability. We have applied this outcome measure with increasing stringency (1.0 EDSS point increase at 3 months, 1.0 point increase at 6 months, 2.0 point increase at 3 months, and 2.0 point increase at 6 months) to data from scheduled visits. The irreversibility (positive predictive accuracy) of these outcomes was determined by comparisons with sustained progression end points, defined from individual EDSS/time plots. To qualify as sustained, a particular confirmed step worsening had to be maintained to the end of the study. Deteriorations during the final 3 or 6 months, for 3 month or 6 month end points respectively, were also excluded.

Worsening to EDSS 6.0

This end point, indicating the need for unilateral ambulatory assistance, has been found to be a useful natural history landmark.9 It was examined with confirmation at 3, 6, 9, and 12 months and tested for positive predictive accuracy by comparisons against sustained worsening to EDSS 6.0.

Categorical EDSS trend analysis

All complete 2 year EDSS datasets (n=289) were plotted for individual patients. Disease trends could be categorised into six subsets according to the following definitions:

Minimal change—Fluctuations of no more than ±0.5 EDSS point (the generally accepted range of EDSS interrater variability) from baseline level (or no change if EDSS⩾5.5).19

Erroneous progression—An increase of at least 1.0 EDSS point confirmed at 3 months, but not sustained to the end of the trial (as defined in the previous section).

Erroneous improvement—A decrease of at least 1.0 EDSS point confirmed at 3 months but not sustained to the end of the trial.

Sustained progression—An increase of at least 1.0 EDSS point confirmed at 3 monthsand sustained until the end of the study.

Sustained improvement—A decrease of at least 1.0 EDSS point confirmed at 3 months andsustained until the end of the study.

Fluctuating—All other EDSS plots with unsustained increases or decreases during the study not meeting the above definitions.

After statistical analyses (see below), these six subsets were collapsed into three clinically meaningful categories of disease courses—namely:

Stable—Including the subsets sustained improvement and minimal change.

Relapsing-remitting—Including the subsets fluctuating course, erroneous progression, and erroneous improvement.

Progressive—Including the subset with sustained progression.


End points for disability change were compared with the Mann-Whitney U test. Relations between 2 year EDSS difference and AUC were explored with (1) Spearman's rank correlation, and (2) stratification of the AUC data by 1.0 point steps of EDSS change. The stratified data sets were normally distributed (Shapiro-Wilk W test) and multiple comparisons were performed with one way analysis of variance (ANOVA) and Bonferroni multiple t tests.

Kaplan-Meier survival analyses were performed for both confirmed and sustained progression outcomes, with censoring of patients who had not reached the particular end point in question, either at study completion, or at the time of dropout. The log rank test was used to measure the amount of irreversibility of confirmed against sustained end points at study completion. The positive predictive value of each confirmed end point was obtained. With categorical EDSS trend analysis, intercategorical disability data (non-parametric) were compared using the Kruskal-Wallis test. The association of baseline demographical factors in predicting subsequent disease courses was investigated by logistic regression analysis.



For the total cohort, the mean baseline and final EDSS were 2.42 (SD 1.24) and 2.81 (SD 1.76) respectively (p<0.0001). The mean unadjusted 2 year EDSS difference was +0.38 (SD 1.16) (after adjustment for EDSS non-linearity, mean difference was +0.45 (SD 1.32)). The mean change estimated by AUC calculations was +0.57 EDSS-years (SD 1.65).

The two disability change end points, 2 year EDSS difference and AUC, were significantly correlated (r=0.76; p<0.0001). When AUC data were stratified by 1.0 point steps of 2 year EDSS difference, there was increased sensitivity of the AUC for measuring disability change as shown by its greater data spread per EDSS point shift. Multiple comparisons demonstrated that the stratified sets were significantly different from each other (all comparisons, p<0.005), thus maintaining separation by conventional EDSS changes.


Confirmed progression

With increasingly more demanding outcome definitions employed in the Kaplan-Meier analyses, there was a gradual increase in the cumulative survival probability and a reduction in the number of patients reaching the end points, utilising either confirmed (32% for 1.0 point at 3 months; 9% for 2.0 points at 6 months) or sustained progression (15% for 1.0 point at 3 months; 4% for 2.0 points at 6 months). However, regardless of the stringency of such definitions, about 50% of the decisions satisfying confirmed progression were erroneous when compared with sustained progression (positive predictive accuracy of 48% for 1.0 point at 3 months and 55% for 2.0 points at 6 months) (fig 1 A-B, table 2).

Figure 1

Examples of Kaplan-Meier survival curves with cumulative survival probability plotted v days in study, for different EDSS progression end points. Comparison of confirmed progression (dotted) v sustained progression (solid) with (A) 1.0 EDSS point progression at 3 months (log rank p<0.0001), and (B) 2.0 EDSS points progression at 3 months (log rank p=0.018). Comparison of confirmed worsening to EDSS 6.0 (dotted) v sustained worsening (solid) at (C) 3 months (log rank p=0.265), and (D) 6 months (log rank p=0.562).

Table 2

Disease progression data for the total cohort (n=313). Comparisons of Kaplan-Meier analyses of “confirmed“ v “sustained“ progression end points, with (A) EDSS point progression and (B) worsening to EDSS 6.0

Worsening to EDSS 6.0

Analyses of confirmed worsening to EDSS 6.0 showed that the error rates for the whole cohort were relatively low (29% and 10% at 3 and 12 months, respectively) and the paired confirmed and sustained progression curves were not significantly different (fig 1 C-D, table2). On the other hand, the incidence rates of these outcome measures for sustained progression were low (5% at 3 months and 3% at 12 months).

Categorical EDSS trend analysis

The individual EDSS plots from 289 patients with complete 2 year data were encoded into the six subsets described above (fig 2). One hundred and seven patients (37%) followed a fluctuating course, whereas 57 (20%) experienced nil, or only minimal change. Forty one patients (14%) initially showed improving in-trial disability trends, 17 of whom maintained improvement to trial end (sustained improvement). Forty out of 84 patients (29%) who originally exhibited progressive trends subsequently improved (erroneous progression). The worsening of disability scores in the remaining 44 patients (15%) was maintained to trial end (sustained progression).

Figure 2

Disability plots from scheduled visits (EDSS change from baseline v days in study) of patients with complete datasets (n=289) encoded into six subsets: (A) minimal change, (B) fluctuating course, (C) erroneous progression, (D) sustained progression, (E) erroneous improvement, and (F) sustained improvement. In each subset, the upper panel shows two representative examples, the bottom panel shows the total series of EDSS plots from all patients.

The disability characteristics as measured by 2 year EDSS difference and AUC were calculated for each subset (table 3). Multiple comparisons (Kruskal-Wallis test) between the six subsets demonstrated that distinct segregation was not consistently accomplished, particularly when comparing differences between the subsets classified as erroneous improvement, fluctuating course, and erroneous progression. These subsets were therefore collapsed into three categories of in-trial disease course (stable, relapsing-remitting, and progressive) which were ordered and separable using the above disability end points (table 3). Thus, 59% of patients followed a relapsing-remitting course, 26% a stable course, and 15% a progressive course (figure 3).

Table 3

Disease progression data for the total cohort with complete datasets (n=289). In trial disability changes of categorical EDSS trend analyses by (A) six subsets and (B) three disease course categories

Figure 3

Mean disability courses (EDSS change from baseline (95% CI)) of patients with complete datasets (n=289) during the study divided into three categories: bottom curve stable course; middle curve relapsing-remitting course; top curve progressive course.

Analysis of the baseline demographic data (age, sex ratio, pretrial relapse rate, and disease duration) did not show any intercategorical differences or associations between clinical parameters at study entry and subsequent disease course categories, although higher EDSS at baseline was significantly associated with a progressive course (p=0.01).


The behaviour of the placebo arm can substantially influence the outcome of a therapeutic trial. This is seen in recent years with the results from the North American phase III interferon β-1b1 and intramuscular interferon β-1a3studies in relapsing-remitting multiple sclerosis. With disease progression defined as confirmed deterioration by 1.0 EDSS point, the interferon β-1b treated arms were not significantly different from placebo,13 whereas patients receiving interferon β-1a benefited significantly more than their placebo counterparts on this and additional more stringent end points.3 14 However, it has been pointed out that differences inherent in the disability of the patients in the interferon β-1a placebo arm may have confounded the results: the 1 year confirmed progression rates were similar for patients receiving interferon β-1b, interferon β-1b placebo, and interferon β-1a (11 to 13%), but much worse (22%) for those randomised to interferon β-1a placebo.11 Hence the behaviour of placebo groups should be examined thoroughly before emphatic conclusions are made about the benefits of putative therapies on disability.

In our present retrospective analysis, data from the placebo arms of two large relapsing-remitting multiple sclerosis treatment trials2 5 were combined and evaluated. This was possible as the two cohorts were very similar in their baseline clinical demography and both studies were adequately blinded with low dropout rates. This approach also ensured data masking, as our aim was not to compare the efficacies of different drugs. In-trial relapse rates were not examined, due to the different definitions used in various trial protocols8 and the established significant benefits of immunomodulatory therapies on relapse reduction. Our analyses focused on the disability changes measured by the EDSS, which have caused much debate among neurologists. Controversies include the difficulties associated with the scale (which are beyond the scope of this article), problems interpreting the conventional end points for progression, and the clinical meaningfulness of such end points in relatively short studies of relapsing-remitting multiple sclerosis. We discuss in turn the outcomes with respect to in-trial EDSS changes and the determinants of disease progression.


The mean adjusted 2 year EDSS difference for the entire combined placebo cohort was <+0.5. This small change is not unexpected when contrasted with natural history series10 18 20 and is comparable with most published phase III trials of relapsing-remitting multiple sclerosis (with the exception of the intramuscular INFβ-1a study) (table 4). In addition, the use of the first and final snapshot assessment scores to derive disability changes in a relapsing and remitting disease ignores the fluctuating in-trial morbidity changes commonly experienced by these patients.

Table 4

Comparisons with disability data from placebo cohorts of other published phase III trials. End points measured at 2 years unless otherwise stated

An alternative technique is to employ the summary measure obtained by integrating the area under the EDSS/time curve (AUC)16with normalisation to baseline EDSS, to capture each patient's within study disability experience. Our analysis demonstrates that AUC is more sensitive to in-trial disability changes, and allows further calculations using continuous data without destabilising the divisions in regard to EDSS steps. The pros and cons of the AUC technique have been previously discussed.16 One problem is that it does not reflect disease trends.21 A patient with a period of improvement followed by deterioration may have the same AUC as one with the opposite temporal sequence, or another whose EDSS does not change throughout the study. Therefore, disease trends need to be separately evaluated.


The behaviour of patients on placebo with regard to confirmed progression end points has been examined systematically in progressive multiple sclerosis.22 By varying the definitions of whether confirmation or treatment failure was allowable, the proportion of patients reaching the end point could vary by 20%. Another study on design strategies of clinical trials did include 91 relapsing-remitting patients, but the disease courses were defined solely on DSS change per year and the authors' objective was to estimate the power necessary for trial planning.18 In view of the demonstrated effectiveness of new treatments on disease progression in recently completed and published phase III placebo controlled trials of relapsing-remitting multiple sclerosis, our emphasis has shifted to the interpretation of such end points. Although success for a drug on a confirmed progression end point must undoubtedly indicate certain therapeutic effects, it is often equated with a beneficial reduction of permanent or sustained disease progression.23 However, it has also been pointed out that 47% of the patients reaching such a treatment failure end point in the first year of the intramuscular interferon β-1a study subsequently improved.12

We tested successively more vigorous definitions on our cohort. The accuracy with which the confirmed progression end point predicted sustained progression was never better than 67%. In addition, as our criteria for sustained progression only considered data up to 2 years, the proportion of erroneous progressors was probably underestimated. Although there is certainly noise in our data, the reversibility of confirmed progression must be partly due to prolonged relapses. Information on the duration of exacerbations from natural history series is scarce, although it has been documented that in patients with early multiple sclerosis, 22% of initial episodes last between 3 and 12 months and 10% between 6 and 12 months.24 On the other hand, the use of worsening to EDSS 6.0 (the requirement for unilateral ambulatory assistance), which has been considered an important milestone in disease progression in multiple sclerosis,9 disclosed no significant differences between paired confirmed and sustained plots. However, this advantage is offset by the very low incidence of the end point. This is not surprising, given that the median rate of reaching this level is between 15 to 20 years from diagnosis.10 20 25 Hence, making outcome definitions more stringent to achieve increasing stability and lack of fluctuation is confounded by decreasing sensitivity. This led us to explore other techniques for predicting disability outcome in relatively short studies.

Our initial premises on disease trend analyses were based on the utilisation of as much trial data as possible (all complete serial 2 year EDSS datasets) and the separation of patients whose progression was sustained to trial end from those with recovery from confirmed progression (erroneous progression). The minimal change subset was defined according to the generally accepted range for EDSS interrater reproducibility error19 while the rest of the cohort typically experienced a fluctuating course. Although all patients fell within the definitions of our six subsets, comparisons of the in-trial disability change end points showed incomplete subset separation. This prompted a secondary assignment of patients into just three categories which were statistically separable, had diverse EDSS plots (with distinct 95% confidence intervals), and clinical relevance. As expected with this classification, most patients (59%) indeed followed a relapsing-remitting course. Interestingly, only 14% maintained a progression of EDSS worsening throughout the study. These patients deteriorated markedly, with a mean 2 year EDSS difference of over 2.0 points and a mean AUC of +3.14 EDSS-years. With the exception of the entry EDSS, which is a well known risk factor for progression,17 we found no other demographical variables to be predictive of this disabling disease trend and it remains speculative whether baseline MRI activity or burden of disease will have prognostic power. The predictive value of our categorical classification remains to be validated with further follow up of our patients. Due to the nature of relapsing-remitting multiple sclerosis, it is likely that at least some misclassifications will have occurred, as shown in previous attempts on disease course assignments.26 However, compared with confirmed progression outcomes, disease trend analyses maximise serial data utilisation and eliminate a substantial proportion of cases with erroneous progression.


The two relatively new outcome methods included in our study are the summary measure statistic (AUC) and the categorical classification of disease trends on the EDSS. There are several prerequisites for these techniques. Firstly, acquisition of study data needs to be adequate, as the methods rely on most of the serial data points being available from each patient and their accuracy depends at least partly on sampling frequency. Secondly, these are not new scales, but merely analytical tools that inherit any drawbacks particular to the scale employed. However, the techniques described may prove even more powerful with the development of new, more reliable, and responsive rating measures (for example, multiple sclerosis functional composite; UK neurological disability scale).27-29 The optimum sampling frequency for these rating scales needs to be determined, as the pay off between accuracy and noise is not yet known.30

In conclusion, we have found in our cohort of patients treated with placebo that the EDSS change over 2 years was small, although improved data utilisation could be achieved by performing summary measure AUC analysis, which is clinically more meaningful in relapsing-remitting multiple sclerosis. In addition, the employment of confirmed progression end points can be erroneous and misleading in this population due to a substantial proportion (about 50%) of patients who recover after prolonged periods of deteriorating disability. Thus, the interpretations of putative treatment benefits on disease progression based on these outcome measures must be reconsidered. An alternative approach, using all the serial EDSS data to plot and categorise the in-trial disability course for each individual patient, was carried out. This disclosed a marked heterogeneity in which only a small proportion of patients showed sustained in-trial worsening or improvement. We found no useful clinical correlates which would enable prediction of these disability courses and the long term accuracy of this method remains to be determined.

Despite the advent of therapeutic trials in progressive multiple sclerosis,31 the treatment of relapsing-remitting patients remains important, particularly in view of recent pathological32 33 and MRI34 35 evidence of axonal damage and atrophy early in the disease. Early clinical observations36 37 and more recent MRI38 and epidemiological data20 testify to the importance of disease activity levels in the first few years of the disease for predicting the risks of significant disability in later years. Although determining outcome by exacerbation count is commonly used, it is relatively crude6 and confounded by difficulties in relapse definitions8 as well as quantification.7 Although there is no substitute for prolonged trials to ascertain long term therapeutic benefits, in practice, studies led by the pharmaceutical industry are unlikely to be lengthy. Moreover, despite its lack of responsiveness39and the variable contributions of the different functional system scores towards disease progression,12 the use of EDSS will probably remain prevalent. Therefore, improvement in data analysis methodology is also essential. Finally, the behaviour of patients treated with placebo in treatment trials can have a profound effect on the analysis of trial results and should be carefully examined before concluding that apparent treatment benefits are solely due to improvements in the actively treated trial arms.11


Thanks are due to the investigators of the Copolymer 1 Multiple Sclerosis Study Group and the PRISMS Study Group who performed the neurological assessments and who, with the permission of Teva Pharmaceutical Industries and Ares Serono International SA, agreed to supply us with the clinical data. The full lists of the trial investigators can be found in references 2 and 5. The authors were coinvestigators of both the PRISMS study, and the multinational, multicentre, randomised, double blind placebo controlled study, extended by open label treatment, to study the effect of glatiramer acetate (Copaxone) on disease activity as measured by cerebral MRI in patients with relapsing-remitting multiple sclerosis.