OBJECTIVES The commonly employed outcome measures on disability and relapse rates in treatment trials of relapsing-remitting multiple sclerosis have well demonstrated sensitivity to treatment effects, but their clinical interpretation is problematic. An alternative method of analysis, which is more clinically meaningful and statistically appropriate to a condition with a fluctuating disease course, uses the summary measure statistic “area under the disability/time curve (AUC)”, to estimate each patient’s total in trial morbidity experience.
METHODS The AUC technique was applied in an intention to treat analysis of serial disability data derived from the expanded disability status scale (EDSS), the Scripps neurologic rating scale (SNRS), and the ambulation index (AI), collected during a double blind, randomised, placebo controlled, phase III trial of subcutaneous interferon β-1a (INFβ-1a) in relapsing-remitting multiple sclerosis (PRISMS Study). The results were compared with the often quoted “conventional” end point of mean change in rating scores from baseline to trial completion. Analyses were also carried out on subgroups with entry EDSS stratified above and below 3.5.
RESULTS EDSS data analysed by AUC normalised to baseline scores disclosed that both doses of IFNβ-1a (22 or 44 μg) were superior to placebo (p= 0.008 and 0.013, respectively). In addition, the high dose (44 μg) was more beneficial than placebo using SNRS (p= 0.038) and AI data (p= 0.039). AUC analysis of SNRS scores also showed that for patients with baseline EDSS>3.5, the 44 μg (but not the 22 μg) dose was more advantageous than placebo (p=0.028).
CONCLUSIONS Summary measure analysis using the AUC of serial disability/time plots, confirms and extends the results of conventional end point analysis of disability from the PRISMS Study data. AUC evaluations show that high dose INFβ-1a (44 μg three times weekly) was beneficial on all of the clinical rating scale scores used in this study. This method provides a statistically powerful and clinically meaningful assessment of treatment effects on in trial disability in patients with multiple sclerosis with fluctuating and highly heterogeneous disease courses.
- multiple sclerosis
- outcome measures
- interferon β-1a
Statistics from Altmetric.com
In the past 5 years, immunomodulatory therapies for relapsing-remitting multiple sclerosis have achieved a useful reduction in relapse rate, but less clear cut benefits on permanent disability, based on the outcome measures employed.1-5 These measures have been associated either with a change in rating scores from study baseline to completion, or with “confirmed progression” as defined by a certain increase in disability scores (for example, the expanded disability status scale, EDSS)6 at two visits 3 or 6 months apart. Interpretation of these measures is difficult, as the first end point ignores the instability and variance associated with two snapshot assessments in time, as well as the fluctuating disabilities that commonly occur in relapsing-remitting multiple sclerosis, whereas the so called confirmed progression (and its graphical depiction using Kaplan-Meier survival curves) includes an unknown number of erroneous treatment failures (that is, cases with recovery to baseline after satisfying so called progression criteria). End points which are both clinically and statistically meaningful and which incorporate the amount of disability associated with each attack into the overall disability calculations7 would be preferred.
One method of analysis that indexes all in trial morbidity changes is the area under the disability/time curve (AUC).8 This technique is appropriate for the mainly transient disability experienced early in the disease course in relapsing-remitting multiple sclerosis and should be more responsive to change as it utilises all the collected serial data. In the present study, we have reanalysed with AUCs the clinical rating scores obtained in a randomised, double blind, placebo controlled trial of interferon β-1a (IFNβ-1a: Rebif®, Ares Serono) given subcutaneously in relapsing-remitting multiple sclerosis, recently published by the PRISMS (prevention of relapses and disability by interferon-beta 1a subcutaneously in multiple sclerosis) Study Group.9 The results are compared with a conventional method of disability analysis (2 year change in clinical rating scores from baseline to end of study). We did not include the confirmed progression end point as it is not strictly comparable with the AUC, because it only utilises a minority of the available disability data in the proportion of patients who reach the end point concerned. In addition, AUC provides no information about the direction of disability change. Thus confirmed progression will be evaluated by comparison with trends established from individual disability/time plots in a separate study.
A total of 560 patients with relapsing-remitting multiple sclerosis were enrolled and randomised to 44 μg (12 MIU) or 22 μg (6 MIU) IFNβ-1a or placebo, administered three times weekly by subcutaneous injection (totalling 132, 66, and 0 μg IFNβ-1a per week, respectively). Details of the patient cohort, exclusion criteria, and clinical results have been reported in an earlier publication by the PRISMS Study Group.9
Patients were assessed neurologically and scored on the EDSS, Scripps neurologic rating scale (SNRS)10 and the ambulation index (AI)11 at the start of the trial and at each visit, by an examining neurologist blinded to the treatment category. In 14 centres, 316 patients had scheduled assessments carried out at 3 monthly appointments. In another six centres, additional monthly scheduled visits were performed for the first 9 months in 205 patients. In one further centre, 39 patients had scheduled monthly assessments throughout the study. Extra evaluations (n=592) were carried out at unscheduled patient initiated visits associated with relapses. Total disability data sets were achieved in 533 patients (95%).
Conventional analysis (2 year disability difference) was carried out by comparing treatment effects on the changes in EDSS, SNRS, and AI, between trial entry and completion. AUCs were calculated by two methods to obtain the AUCSUM and the AUCCHANGE. For AUCSUM, the total area under the disability/time curve throughout the trial was determined.8 12 For AUCCHANGE, the AUCSUM was normalised to the baseline score by subtracting the area defined by the product of the initial rating score and the study period (see ). Between EDSS of 5.5 to 7.0, each 0.5 point increment was rescaled to 1.0 point to adjust for non-linearity of the EDSS.13 14
Three types of data analyses were performed. In the initial analyses (combined data), all 7060 datapoints from the scheduled and unscheduled visits of 560 patients were included in the AUC calculations (intention to treat analysis). The trapezium rule for determining the AUC was applied throughout,12 as most objectively confirmed attacks had only one additional neurological assessment and the speed and timing of relapse onset15 and offset16would be difficult to define with certainty if other techniques were employed. Secondly, separate analyses (scheduled visit data) were carried out using solely the 6468 datapoints obtained at routine appointments. Thirdly, due to the different estimated disease course characteristics and DSS score staying times of patients with DSS⩽3 and DSS⩾4 in natural history series,13 analyses of subjects classified by baseline EDSS⩽3.5 (n=466) or EDSS>3.5 (n=94) were attained.
For both conventional and AUC data, an analysis of variance (ANOVA) model was employed with factors for treatment and centre. One degree of freedom contrasts from the ANOVA model were used to compare the treatment groups in a pairwise fashion. As there was a strong effect of baseline entry scores, all further analyses of AUCSUM were carried out with baseline disability (EDSS, SNRS, or AI) as a covariant.13 ANOVAs on the ranks (Kruskal-Wallis tests) were performed to determine consistency of results and validity of the parametric (ANOVA) conclusions. As our data were not normally distributed (Shapiro-Wilk Wtest), p values on treatment comparisons were obtained from ANOVAs on the ranks. Mean estimates with 95% confidence intervals (95% CIs were also calculated to compare treatment effects with placebo.
TOTAL COHORT ANALYSES WITH COMBINED DATA
”2 year EDSS difference” demonstrated significant benefits favouring the 22 μg dose (p=0.026) and a tendency in favour of 44 μg (p=0.052) over placebo. AUCSUM calculations similarly showed that 22 μg IFNβ-1a was beneficial compared with placebo (p=0.046). Although the size of the treatment effect was similar, significance was not reached with the 44 μg dose (p=0.064).
The median AUCCHANGE was +0.06, +0.05, and +0.48 EDSS-year for 44 μg, 22 μg IFNβ-1a, and placebo, respectively (table 1 A). Both doses conferred significant advantages over placebo (p=0.013 and 0.008 for 44 μg and 22 μg IFNβ-1a, respectively), with an estimated treatment effect of −0.5 EDSS-year for each dose (95% CI −0.9 to−0.1 and −0.8 to −0.1, respectively) (fig 1A-B).
There were no significant group differences in efficacy based on either “2 year SNRS difference”, or AUCSUManalysis. With AUCCHANGE, the 44 μg treatment arm improved by a median of +0.17 SNRS-year compared to deteriorations of −0.25 and −1.68 SNRS-year for the 22 μg dose and placebo cohorts, respectively (table 1 B). There were significant effects in favour of 44 μg IFNβ-1a over placebo (p= 0.038), with an estimated mean benefit of +2.5 SNRS-year (95% CI +0.1 to +4.9) (fig 1 C-D).
There were no significant group differences for either the “2 year AI difference”, or AI AUCSUM. On AUCCHANGE, the 44 μg dose (median of 0 AI-year) was significantly superior to placebo (median worsening of +0.11 AI-year) (p=0.039) with a mean benefit of −0.4 AI-year (95% CI -0.7 to -0.1) (table 1 C, fig 1 F).
TOTAL COHORT ANALYSES WITH SCHEDULED VISIT DATA
Utilising data from scheduled visits only, AUCSUManalysis on EDSS scores disclosed a trend favouring 22 μg IFNβ-1a over placebo (p=0.058). AUCCHANGE calculations showed that on EDSS scores, there were significant benefits of both doses over placebo (p=0.024 and 0.014 for 44 μg and 22 μg, respectively). For SNRS and AI data, treatment with IFNβ-1a 44 μg was also superior to placebo.
For patients with entry EDSS⩽3.5, treatment comparisons showed 22 μg to be better than placebo, for both “2 year EDSS difference” (p=0.016) and AUCSUM analyses (p=0.043). Using AUCCHANGE, there were significant effects in favour of both treatment doses compared with placebo (p=0.036 and 0.016 for 44 μg and 22 μg IFNβ-1a), with estimated benefits of −0.4 and −0.5 EDSS-year, respectively (95% CI −0.8 to 0; −0.8 to −0.1) (table 2 A, fig 2 A-B).
For patients with baseline EDSS>3.5, neither conventional “2 year EDSS difference”, nor AUC analyses, showed significant differences between treatment arms, although AUCCHANGE showed an estimated mean benefit for the 44 μg dose over placebo of −1.0 EDSS-year (95% CI −1.9 to 0) (fig 2 A-B).
For patients with a baseline EDSS >3.5, there was a significant effect in favour of the 44 μg dose over placebo using conventional “2 year SNRS difference” (p=0.016). AUCSUM did not show differences between treatment arms. AUCCHANGE confirmed significant effects of 44 μg IFNβ-1a versus placebo (p=0.028) with a mean benefit of +9.0 SNRS-years (95% CI +2.9 to +15.1) (fig 2 C-D).
Neither “2 year AI difference”, AUCSUM, nor AUCCHANGE showed any significant treatment effects over placebo for either subgroup (table 2 C, fig 2 E-F). However, for AUCCHANGE, the 44 μg dose conferred an estimated mean saving of −0.9 AI-year over placebo (95% CI −1.6 to -0.1).
Subgroup scheduled visit data analyses
Treatment comparisons between subgroups using data restricted to scheduled visits showed significant benefits of both treatment doses for patients with entry EDSS⩽3.5 on EDSS assessments (p=0.021 to 0.043), and favourable effects on SNRS scores with high dose 44 μg IFNβ-1a over placebo in patients with EDSS>3.5 (p=0.034).
Whereas it is generally agreed that the primary outcome in phase III immunomodulatory trials in relapsing-remitting multiple sclerosis must be clinical,17 commonly employed end points associated with both disability and relapse assessment can be difficult to define and interpret.8 16 Meaningful clinical milestones, such as the conversion to secondary progression,18 19 are not realistic end points for relapsing-remitting multiple sclerosis cohorts within relatively short study periods of 2 to 3 years. Currently available clinical rating scales are variously flawed,20-22 with the EDSS being the most criticised.13 14 20 23 24 Other scales (including SNRS and AI) have been less thoroughly evaluated.25 We have previously proposed that the AUC summary measure statistic may solve some of these problems in a clinically meaningful way by indexing and accounting for the total morbidity change experienced by patients during the course of a trial.8
In this paper, we have analysed data from the PRISMS trial using two AUC methods, AUCSUM and AUCCHANGE. AUCSUM summates the total disability data serially over the study period and data independence is assured (the independence is lost in analyses involving any change in disability scores as each data point is influenced by its preceding values). However, it would be expected to be insensitive to small changes in short trials if there were a large disability range in the cohort at baseline. On the other hand, AUCCHANGE obtained by normalising the AUCSUM value to the baseline rating is sensitive to in trial changes and has been previously utilised in a neurorehabilitation pilot study.26 However, neurological improvements and deteriorations may cancel and the technique is susceptible to unstable scores at trial entry, a problem partly resolved by ensuring a stable run in period.
In the present study, comparison of the two AUC techniques showed that AUCSUM essentially confirmed the “2 year disability difference” analysis (treatment benefit over placebo for EDSS, but not for SNRS or AI data) without improved responsiveness. This can be attributed to the small in study changes relative to the wide range of disability (EDSS 0–5) at baseline. By contrast, the increased sensitivity of AUCCHANGE for detecting positive therapeutic effects not only showed savings in terms of EDSS-years for both IFNβ-1a doses compared with placebo, but also significant beneficial effects in favour of the high dose (44 μg three times weekly), using SNRS and AI data.
AUC analysis has several advantages over conventional techniques for this type of study. The method can be applied to any clinical rating scale employed. Because it incorporates all available serial data, it should be statistically more powerful than end points of conventional mean disability change. By accounting for both the magnitude and duration of relapses as well as improvements and progressive deteriorations, it is more clinically meaningful, as it provides a measure of the patient’s total disability experience. Furthermore, cost effectiveness can be assessed in terms of disability-years. These factors allow a more complete intention to treat evaluation.
We examined the neurological scores from all visits (combined data), as well as those assessed only from scheduled appointments (scheduled visit data). The second was performed to reduce any sampling bias due to the reduced mean relapse rate in the treatment arms compared with placebo, and to separate out, at least in part, the effects of transient disability changes. The results in favour of therapy were similar with or without the inclusion of data from unscheduled visits.
We analysed our higher disability cohort separately in this study because there is accumulating evidence that patients with active relapsing-remitting multiple sclerosis and an EDSS >3.5 (when gait dysfunction ensues) are at a higher risk for disability progression.27 These patients have a tendency to deteriorate more rapidly13 and to be unresponsive to steroids and standard doses of IFNβ.28 The AUCCHANGE analysis of this subgroup disclosed that patients on placebo experienced strikingly more disability on EDSS, SNRS, and AI than their treated counterparts. AUCCHANGE confirmed the conventional end point of 2 year SNRS difference in showing significant treatment effects only for the 44 μg dose. Furthermore, from the estimated means and confidence intervals, trends were apparent for similar dose effects with the EDSS and AI data, although statistical significance was not reached due to the relatively few subjects. Overall, these results support the notion that subjects with significant disability require higher doses than patients at the lower end of the EDSS (with mainly impairment), to obtain similar benefits.27
The differential outcomes of the two treatment doses on disability as assessed by the different rating scales are intriguing and difficult to explain. They may be confounded by the modest correlations within individual patients across changes in rating scores,29 and the relatively lack of comparative data on SNRS and AI.25 Moreover, in the PRISMS Study cohort, the SNRS scoring system generates a more continuous dataset than the EDSS or AI and thus, for the relatively small subgroup of subjects with baseline EDSS>3.5 (n=94), may be more sensitive to the treatment effects of the 44 μg dose.
AUC analyses might be regarded as having certain disadvantages. For example, the method summarises all the disability experienced by a patient during a trial, but does not distinguish between permanent neurological dysfunction and the transient disability associated with relapses. However, we regard this as being useful for assessing the effects of potential therapies in patients with relapsing-remitting multiple sclerosis in whom significant transient morbidity may occur over short periods despite the slow accumulation of irreversible disability.19 30 AUC analyses also provide no information on disability trends over time: a subject with a period of improvement followed by deterioration may have the same AUC score as one with the opposite temporal sequence, or another in which the morbidity remains unaltered throughout. This issue, as well as the problem of erroneous treatment failures wrongly assigned by confirmed progression and Kaplan-Meier analyses, can only be solved by evaluating individual disability/time plots. The question of an optimal sampling rate, including concerns of the signal to noise of rating scales employed at high assessment frequency,17 also needs to be considered. Although considered appropriate for analysing graded data,12 AUC analyses do not eliminate problems peculiar to the clinical rating tools utilised, as discussed previously. However, these difficulties likewise apply to the disability end points presently utilised in clinical trials. It has been suggested that improved clinical rating tools, preferably with continuous scales, are required (for example, maximum ambulatory distance, timed 8 metre walk, nine hole peg test, paced auditory serial addition task, questionnaire based assessments).31-35 The numerical data derived from such scales is likely to be even more informative when subjected to AUC analyses.
CL and LDB were investigators in the PRISMS study. Thanks are due to Florilene Dupont and Melvin Olsen of Ares Serono for help in data analysis.
AUCSUM:This is a summation of the total areas (including baseline) under the curve between each pair of consecutive scores given by the trapezium rule. If disability scores (y0, y1, y2,..., yn) are plotted versus their times of assessment (t0, t1, t2,..., tn) totalling n+1 measurements yi at times ti (i= 0,..., n), then
AUCCHANGE:This is the difference between the areas summated by AUCSUM and the product of the baseline disability score y0 and the total time of study tn, hence AUCCHANGE=AUCSUM - (y0) (tn - t0)
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.