OBJECTIVES Serial brain MRI is widely used in pilot studies of new agents to monitor treatment efficacy in relapsing-remitting (RR) and secondary progressive (SP) multiple sclerosis (MS). For pilot trials, sample size calculations for the RR subgroup are based on the data from small numbers of patients and separate calculations for the SP subgroup have not been performed. The present study considers these issues.
METHODS The sample size calculations were based on data from six months of monthly T2 weighted and gadolinium enhanced MRI in 31 RR and 28 SP untreated patients undergoing natural history studies or in the placebo arm of a therapeutic trial. The calculations were for a placebo controlled, parallel groups design lasting six months. The sample sizes were based on bootstrap analysis with an 80% likelihood of showing a given treatment effect.
RESULTS With a single pretreatment scan, demonstration of a 70% reduction in newly active lesions required 2×30 RR and 2×50 SP patients. With an extra run-in scan one month before treatment, the sample sizes were 2×20 for RR and 2×30 for SP patients.
CONCLUSIONS The sample sizes required for RR patients were comparable with previous smaller studies. Larger sample sizes were needed for the SP group, but the extra run in scan resulted in a reduction in both groups. The larger sample sizes in the SPMS group were probably due to the combination of a higher proportion of patients with low MRI activity (⩽2 active MRI lesions in 50% of SP and 32% RR patients), as well as a few patients who displayed extremely high activity, thus increasing interpatient variability. These data should be considered in planning pilot MRI outcome trials.
- multiple sclerosis
- power calculations
- treatment trials
Statistics from Altmetric.com
Because of the highly variable course of multiple sclerosis (MS) and the unsatisfactory nature of the commonly used measures of outcome, there are inherent problems in designing treatment trials with clinical end points such as relapse rate or change in disability.1-3 Clinical end points are definitive but are logistically difficult to reach. Therefore, surrogate markers that give earlier answers are much sought after.
Magnetic resonance imaging (MRI) is currently the most plausible surrogate marker for monitoring disease activity in treatment trials.2 4 In relapsing-remitting (RR) and secondary progressive (SP) MS, monthly T2 weighted and/or gadolinium-enhanced brain MRI detect on average about five to 10 new lesions for every clinical relapse.5-11 Based on natural history data with small numbers of patients, sample sizes for pilot MRI studies have been calculated.12-14 The calculations have been based on data from patients with RRMS alone or on RR combined with SP patients. A separate set of power calculations for patients with SPMS alone has not been performed before. The present study is based on a larger cohort and reports calculations for the RRMS and SPMS groups separately.
The data from patients involved in five previous studies were reviewed.7 8 10 11 15 Serial scanning had been performed either as part of a natural history study,7 8 10 11 or in patients in the placebo arm of a therapeutic trial.15 During the study period, none of the patients had received disease modifying therapy, apart from short courses of corticosteroids for acute relapses. Each patient had initial T2 weighted and gadolinium enhanced brain MRI and then monthly follow up scans for six months.
Two groups of patients were defined: (1) RRMS (n=31). This group was defined as clinically definite MS16 with clear episodes of acute neurological dysfunction lasting more than 24–48 hours with full or partial recovery and a stable clinical state between attacks.17 The clinical and MRI data of these patients were drawn from three previously published sources.8 10 15 (2) SPMS (n=28). This was defined as clinically definite MS16 with the development of a gradual increase in disability, with or without superimposed relapses, for at least six months, after an initial relapsing-remitting course.17 The data were also drawn from three sources.7 11 15
Table 1 shows the clinical data for the patients in each group. The patients had a wide range of duration of disease and disability but overall had clinical characteristics typical of both disease subgroups. The patients, as initially selected, were representative of MS patients seen at a tertiary referral centre and were also representative of patients who would be considered for inclusion in treatment trials. Although the data are drawn from several studies, scans were performed using comparable protocols. Thus all the patients had a T2 weighted (TR 2000-2755 ms, TE 60-120 ms) scan. A T1 weighted image (TR 450-575 ms, TE 13-40 ms) was obtained after 0.1 mmol/kg gadolinium -DTPA was given intravenously.
On the pretreatment scan, the number of gadolinium enhancing lesions was counted. On the follow up scans newly active lesions were counted. These were defined as new enhancing lesions (>90% of all new active lesions), new non-enhancing lesions, or new enlarging but non-enhancing lesions on the T2 weighted images. Table 2 and table 3 show the serial scan data tabulated for both patient groups.
Three MRI outcomes were evaluated: AI—the number of patients showing new active lesions at any time during the study period; AII—the proportion of scans showing newly active lesions during the study; AIII—the number of newly active lesions seen over the whole study period. AI data are not considered further as they show much poorer statistical power.13
Power estimates were then calculated using a “bootstrap” method of computerised sampling and trial simulation as previously described.18 In this procedure, for every study under consideration 1000 cases are drawn randomly with replacement (each patient may be repeatedly drawn) from the original data sets. A theoretical distribution is used to simulate a treatment effect. In the case of a homogenous population being assumed, use is made of a Bernouilli trial (“coin flipping” as it is referred to in the article). When a heterogenous (more variable) patient response to a proposed treatment is more plausible, a β distribution is used which accounts for differences between patients, but still allows for a mean probability of response to be calculated. To compute the resulting power of a given sample size at a certain treatment efficacy, Wilcoxon’s test statistic is used to compare the (simulated) patient groups. The corresponding probability is used as a power estimate.
To arrive at confidence intervals for the power estimates given in the tables in this report, the entire procedure could be repeated, for instance, 100 times. To compute such confidence intervals the procedure should be repeated a sufficient number of times and the resulting power estimates analysed anew. As all bootstrap sample sizes that are calculated will be based on the same underlying patient data, appropriate corrections on the final variance should be made.
A total of eight sets of power calculations were computed—four for each of the two patient groups. Power calculations were made both with and without the use of an additional run-in scan obtained one month before the start of treatment. Using this extra scan, the new active lesions between month−1 and month 0 (the pretreatment scan) were subtracted from the number of new active lesions during the study period. Calculations were made for both a homogenous (little between patient variability), and for a heterogenous (where patient response is highly variable) treatment response.
The efficacy of an experimental treatment was expressed as a percentage and represents the reduction in newly active lesions (AIII) seen when a patient has received a treatment compared with those given placebo. The sample size was calculated for a 60%, 70%, and 80% level of efficacy. Only a parallel groups study design was simulated. It was assumed that the putative treatment takes immediate effect and that its efficacy does not alter during follow up.
In the RRMS group (n=31), on the initial scan 15 (48.4%) patients had enhancing lesions. During follow up, six patients showed no newly active lesions during the study period. The total number of scans with newly active lesions was 88 (47.3%). The mean number of newly active lesions/patient was 8.9 (median 4 (SE 2), range=0–39).
In the RRMS group, without the extra run-in scan, the demonstration of a 70% AIII efficacy with a power of 80 required 2×30 patients to be studied for six months or 2×40 patients for four months (table 4). When the extra run-in scan was added, 70% efficacy was demonstrated if 2×20 patients were studied for six months or 2×30 patients for four months (table 5).
In the SPMS group (n=28), on the initial scan 10 (35.7%) had enhancing lesions. During follow up, six patients showed no newly active lesions in the six months of the study. The total number of scans with newly active lesions was 75 (44.6%). The mean number of newly active lesions/patient was 12.2 (median 2.0 (SE 3.9), range=0−85).
In the SPMS group without the run-in scan, demonstration of a 70% AIII efficacy with a power of about 80 required 2×50 patients to be studied for six months (the actual power calculated was 78 for a homogeneous and 83 for heterogeneous response) or 2×75 for four months (table 6). When the extra run-in scan was added, 70% efficacy was demonstrated in 2×30 patients in four or six months, but not in 2×20 patients studied for six months (table 7): thus the SPMS sample sizes come closer to those seen in the RR group (figure).
Three groups of patients with different levels of new lesion activity were defined:
(1) Low activity—0–2 new active lesions during the six months of follow up:
(2) Moderate activity—3–30 new active lesions.
(3) High activity—greater than 30 new active lesions.
A higher proportion of SPMS patients exhibited low activity (14/28 (50%) v 10/31(32%) of RR patients), whereas a greater number of RR patients had moderate activity (18/31 (58%)v 9/28 (32%) of SP patients). High activity was seen in 18% of SP and 10% of RR patients. The two patients with the highest MRI activity (66 and 85 new active lesions) had SPMS (table8).
The sample size calculations for the two groups of patients were tabulated for the AIII response (tables 4-7). The calculations for a homogeneous or heterogeneous response were generally very similar: only the homogeneous response data are presented. Slightly smaller sample sizes were generally required for AIII than for AII data, in both RR and SP subgroups (AII data not shown).
At present, the generally modest correlations between MRI findings and clinical status in MS means that new therapies are ultimately judged in large phase III trials in which a clinical outcome is the primary measure.4 Nevertheless, MRI is now widely used as the primary outcome measure of disease activity in exploratory Phase I/II trials in RRMS and SPMS, because the high frequency of asymptomatic disease activity detected by MRI in these subgroups makes it a powerful tool for small cohort and short duration studies.5-11 Furthermore, some correlations do exist between the activity seen on serial T2 weighted and gadolinium enhanced scans and clinical measures of disease activity or progression in both RR5 6 8-10 19 and SP8 20 subgroups, the most consistent being the higher frequency of enhancing lesions during clinical relapse. These correlations suggest that it is valid and clinically relevant to use MRI to determine efficacy in exploratory trials.
Some previous MRI studies of small cohorts have reported broadly similar mean rates of new lesion activity in RR and SP disease.5-11 Pronounced between patient variability in the level of MRI activity has, however, been readily apparent in both groups.5-12 21 In particular, higher rates of activity have been reported in RR and SP patients who continued to relapse, whereas lower rates were seen in RR patients in remission10 and SP patients who continue to progress, but without superimposed relapses.11 Such heterogenous patterns in reports involving small cohorts led us to the present study of larger and separate RR and SP cohorts on which to perform sample size calculations for exploratory MRI outcome treatment trials.
The number of patients that need to be enrolled into a trial, the number of hospital visits required, and the number of scans that need to be performed, must all be established to design a trial which minimises the burden on patients, is cost efficient, and has a high likelihood of showing the anticipated treatment effect. In the last respect, trials that are predicted to yield a statistical power less than 80 are generally deemed unacceptable. We employed well documented statistical techniques.12-14 In the RR subgroup we compared our results with other published studies, but have used a larger patient cohort. Our study also considered SPMS patients as a separate entity for the first time.
Our results showed significant differences between the two groups. When just the single pretreatment scan was obtained, substantially larger sample sizes were required for the SP group to show an equivalent therapeutic effect in a given period of time.
The addition of one extra run-in scan (but not more than one) before entry into an exploratory treatment trial was previously shown to reduce the sample sizes required.13 We again saw this effect in the present analysis, in both of our patient subgroups. For SPMS patients the effect was especially pronounced, and using the run-in scan the differences in sample sizes needed to show the same treatment effect in the same period of time were smaller (table 7).
The beneficial effect of the addition of the extra run-in scan is that it reduces some of the between patient variation in MR activity. The appreciably larger sample sizes in the SP patients when such a scan was omitted suggests a greater between patient variability among these patients . There were indeed notable differences between RRMS and SPMS groups when the new lesion activity over the six months was divided into those patients who showed low, moderate, and high levels of new lesion activity. Half of the SPMS group had low levels of activity compared with only one third of RRMS patients. By contrast, most RR patients (58%) displayed moderate new lesion activity. High levels of activity occurred in a small proportion in each group but if anything slightly more in the SPMS group, a few of whom were extremely active. The occurrence of more low activity scans in the SPMS patients will reduce the statistical power in this group as there will be fewer lesions to “treat”. The occurrence of more extreme highs in the same group will also reduce the power of the study as there is a greater degree of between patient variability. These factors probably account for the larger sample sizes calculated for SPMS than for RRMS patients in the present study.
The present study provides tables (tables 4-7) of the power estimates for a range of sample sizes at treatment efficacies of 60%, 70%, and 80%. The efficacy of an experimental treatment is, for example 70%, if an average of 70% fewer active lesions are seen when a treatment is given compared with placebo. Treatment efficacies of less than 60% can be calculated by adapting the computer program in such a way that the user can enter any desired efficacy and run it with any data set they wish (the data set on which this article is based is included as a demonstration set ( table 2 and table 3)). However, treatment efficacies lower than 60% may need to be interpreted with more caution. The experience from trials of β-interferon and other immunomodulatory therapies suggests that a major impact on MRI may be associated with more modest clinical effects.22 23 24Because the correlation between MRI and clinical disability in MS is modest, it may be prudent to demand a relatively high level of efficacy on MRI activity in phase II trials as a requirement for proceeding to phase III clinical outcome studies, especially if the therapy under investigation is expensive and has significant side effects.
The differences we have noted between the RR and SP subgroups are of practical importance when planning treatment trials in the future. They warrant consideration in choosing the appropriate sample size and cohort. Based on the similar patterns of activity reported in small cohorts,5-11 some pilot MRI studies have combined RRMS and SPMS subgroups15 22; an obvious advantage of this approach is that patient recruitment is easier, especially now that disease modifying therapies are being increasingly used, particularly in RR patients. Our present data suggest that such a combination may be problematic if only a single pretreatment scan is obtained: in this instance a substantially larger cohort of SP patients are needed, and randomisation errors which result in uneven proportions of RR and SP patients in the treated and placebo groups could lead to spurious results. If an extra run-in scan is added, combining RR and SP cohorts may be more acceptable as the differences in sample size are smaller.
When steroids were given to treat relapses during the natural history studies from which the present clinical and MRI data were taken, an attempt was often made to perform the enhanced MRI before starting treatment, to minimise any effect on MRI activity. A three day (1g/day) course of intravenous methyl prednisolone is the usual regime for treating relapses at our centres. Uncontrolled studies suggest that intravenous methyl prednisolone causes a temporary reduction in the number of enhancing lesions for periods ranging from one week to two months.25 26 However, in one of these studies there was a total of 53 new enhancing lesions among 10 patients one month after a course of 1g intravenous methyl prednisolone/day for three days.25 This suggests that the effect of this particular regime on the formation of new lesions is likely to be modest and transient.
The results also have implications for our understanding of the pathophysiology of MS. As new lesions in RRMS and SPMS usually display an initial phase of gadolinium enhancement, the concept has emerged that breakdown of the blood-brain barrier is a necessary event in the development of new pathology and, by inference, ongoing clinical deterioration.27 However, we found that over a half of our SP cohort had a minimal amount of enhancing lesion activity despite the fact that they were in a phase of the disease with a poorer clinical prognosis, characterised by a steady accumulation of disability.28 29 This discordance between lack of enhancement and clinical progression is already well recognised in the smaller cohort of MS patients who have a progressive and non-relapsing illness from onset (primary progressive MS)7 Our findings suggest that the pathophysiology of secondary progression is not necessarily dependent on blood-brain barrier abnormality at least as seen using standard dose gadolinium enhanced MRI.
To further optimise the design of MRI outcome treatment trials, more studies are needed to elucidate factors which may influence or predict MRI activity—for example, the frequency of relapses before and during the study, entry expanded disability status scale, age, sex, disease duration, and pre-existing MRI activity. Such studies will need to examine larger cohorts of patients. Some work in this area has already been published with RR patients.21 We are currently performing such an analysis in SP patients.
We acknowledge the generous support of the MS Society of Great Britain and Northern Ireland and the Dutch MS Society. NT is supported by a grant from Athena Neurosciences.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.