Article Text
Abstract
Objectives To evaluate candidate outcomes for disease-modifying trials in Huntington's disease (HD) over 6-month, 9-month and 15-month intervals, across multiple domains. To present guidelines on rapid efficacy readouts for disease-modifying trials.
Methods 40 controls and 61 patients with HD, recruited from four EU sites, underwent 3 T MRI and standard clinical and cognitive assessments at baseline, 6 and 15 months. Neuroimaging analysis included global and regional change in macrostructure (atrophy and cortical thinning), and microstructure (diffusion metrics). The main outcome was longitudinal effect size (ES) for each outcome. Such ESs can be used to calculate sample-size requirements for clinical trials for hypothesised treatment efficacies.
Results Longitudinal changes in macrostructural neuroimaging measures such as caudate atrophy and ventricular expansion were significantly larger in HD than controls, giving rise to consistently large ES over the 6-month, 9-month and 15-month intervals. Analogous ESs for cortical metrics were smaller with wide CIs. Microstructural (diffusion) neuroimaging metrics ESs were also typically smaller over the shorter intervals, although caudate diffusivity metrics performed strongly over 9 and 15 months. Clinical and cognitive outcomes exhibited small longitudinal ESs, particularly over 6-month and 9-month intervals, with wide CIs, indicating a lack of precision.
Conclusions To exploit the potential power of specific neuroimaging measures such as caudate atrophy in disease-modifying trials, we propose their use as (1) initial short-term readouts in early phase/proof-of-concept studies over 6 or 9 months, and (2) secondary end points in efficacy studies over longer periods such as 15 months.
- HUNTINGTON'S
- MRI
Statistics from Altmetric.com
Introduction
Major efforts are being invested in the development of disease-modifying therapies for neurodegenerative disorders such as Huntington's disease (HD).1 Testing their efficacy in clinical trials is a long and expensive process, with low success rates compared with other branches of medicine.2 In HD, phase III studies of putative disease-modifying treatments have not been successful, despite many showing promise during early testing.
A wealth of observational data suggests that biomarkers of disease progression may facilitate the evaluation of disease-modifying therapies.3–6 MRI-derived neuroimaging measures appear particularly powerful, with data suggesting that substantially fewer patients would be required to detect a reduction in rate of change in MRI biomarkers, compared with clinical measures.3–9 However, many biomarkers have only been evaluated over intervals ≥12 months.
It may be advantageous for clinical trials to have efficacy readouts over short intervals such as 6 months, especially during the early phases, in order to provide confidence-instilling data that the trial should progress to a larger scale. However, the use of short-interval biomarkers in clinical trials is critically dependent on their validation in longitudinal observational studies over the same time frame.
Our objectives were to evaluate candidate outcomes for HD trials over 6-month, 9-month and 15-month intervals, across neuroimaging, clinical and cognitive domains. Based on our findings, we present guidelines on the selection of outcomes for rapid readouts in clinical trials. It is hoped these data will directly inform the design of HD trials, facilitating the evaluation of treatments designed to slow the course of this devastating disease.
Methods
Study design
This was a longitudinal, case–control observational study in HD. Assessments were performed at baseline, 6 and 15 months. The study was approved by the local ethical committees.
Participants
Between March and October 2011, 40 controls and 61 patients with HD were enrolled into Work Package 2 of the PADDINGTON study10 at Leiden (the Netherlands), London (the UK), Paris (France) and Ulm (Germany). Patients were recruited from research centres. Controls were spouses, partners or gene-negative siblings in order to match patients to controls as closely as possible in terms of age, education level, background and home life. Patients were ideally required to be at stage 1 of the disease,10 defined by a Unified Huntington's Disease Rating Scale (UHDRS)11 Total Functional Capacity (TFC) ≥11, indicating good capacity in functional realms; however, five patients were granted waivers for not fulfilling this TFC criterion, as described in the Results. Inclusion criteria included participants being 18–65 years of age, free from major psychiatric and concomitant neurological disorders, not currently participating in a clinical trial, and able to tolerate and safely undergo MRI. Written informed consent was obtained from each participant.
Procedures
Clinical features were assessed using the UHDRS V.99. This included the Total Motor Score (TMS), which measures a range of motor features characteristically impaired in HD including gait, tongue protrusion, ocular function and postural stability; and the TFC scale, which measures five components of daily living, including the capacity to work, manage finances and carry out domestic chores. The clinical examination was performed by raters certified by the European Huntington's Disease Network (EHDN) UHDRS-TMS online certification (http://www.euro-hd.net).
Cognitive features were assessed using the core EHDN cognitive battery, which consists of standard pencil and paper clinical neuropsychological tasks. All raters were trained on the battery and all tests were scripted. Each task is described in the online supplemental methods.
MRI acquisition
3 T MRI (T1-weighted, T2-weighted and diffusion-weighted) were acquired based on protocols standardised for multisite use.6 ,10 ,12 Scan acquisition protocols have been described previously.10 Quality control was performed on all data sets in pseudo real-time and rescans were requested where necessary. Data were pseudoanonymised and archived on a secure web portal. To avoid potential bias, all image analyses were performed blinded to groupings.
MRI: macrostructural (volumetric) analysis
Predefined regions-of-interest (ROIs) for the volumetric analysis included the caudate, putamen, white matter, grey matter, whole brain, lateral ventricles and corpus callosum. Cortical thinning was also examined over each lobe (parietal, occipital, temporal and frontal).
The software package MIDAS13 was used to delineate the whole brain, caudate, corpus callosum and ventricles at baseline.10 Change in whole brain, caudate and ventricular volume over the scanning interval was estimated using the Boundary Shift Integral (BSI) technique,14 optimised for multisite data,15 within the MIDAS software. The BSI is a semiautomated tool that measures volume change over time (atrophy) directly from within-participant registered scan pairs. Change in corpus callosum and putamen volume was estimated by delineating the structures at both time points, either manually11 (for all corpus callosum measurements) or with BRAINS3 software6 ,16 (for all putamen measurements), and subtracting the volumes at each time point. Grey matter and white matter volume changes were computed using a fluid-registration approach.5 ,17 ,18
Cortical thickness measures were computed using FreeSurfer software (http://surfer.nmr.mgh.harvard.edu/; V.5.3.0). All scans were run through the longitudinal pipeline19 and thickness estimates (mm) were extracted from each region defined by the Desikan-Killiany Atlas and averaged within lobes.20
Full details of all volumetric image analysis are provided in the online supplemental methods.
MRI: microstructural (diffusion) analysis
Diffusion metrics of fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity and radial diffusivity were generated over predefined ROIs (white matter, corpus callosum, caudate and putamen) for all three visits using a longitudinal registration pipeline. In brief, a common ROI mask was defined in a temporally unbiased ‘mid-space’ based on within-participant registration of T1 images, before being non-linearly registered to each individual's native FA images, for each visit. The mean values were then calculated across all included voxels for the four diffusion metrics. This analysis is described in detail in the online supplemental methods.
All segmentations and registrations were visually inspected for accuracy by trained analysts blinded to diagnosis. Excluded data points are described in Supplemental End-point Quality Control data.
Statistical analysis
Statistical analysis was performed by an independent team according to a predefined analysis plan. The repeated measures of each outcome variable were analysed using generalised least squares regression models, with variances of the outcome (and correlations between pairs of measures) allowed to differ both by group and by visit. The models included a group factor (HD or control), calendar time from baseline (in days) and a quadratic term to allow non-linear change over the three visits to be modelled. The use of generalised least squares models that jointly model all available outcomes provides some additional protection against the impact of missing values. Data only require a ‘missing at random’ assumption, rather than the more restrictive ‘missing completely at random’ assumption, to give unbiased estimates.21 Where outcomes directly measured changes (such as whole-brain atrophy between two visits) the outcome variables in the statistical models were change between baseline and 6 months (ie, 6-month interval), change between baseline and 15 months (ie, 15-month interval), and change between 6 and 15 months (ie, 9-month interval). Otherwise, outcomes were measures made at baseline, 6 and 15 months. Linear and quadratic effects of time were included in all models with estimated between-group differences for the 6-month, 9-month and 15-month intervals calculated using appropriate linear combinations of model parameters. All analyses adjusted for baseline age, gender and study site, as well as interactions with the linear and quadratic effects of time. This was due to an a priori belief that age, gender and study site might affect slopes (and rates of change in slopes) as well as absolute levels of the outcomes. Models for non-imaging outcomes adjusted additionally for educational level (an ordered categorical variable treated as a continuous covariate) and its interactions with linear and quadratic effects of time, because education level may affect performance on such outcomes, and education levels were expected to differ systematically between HD and controls.
Longitudinal effect sizes (ESs) with 95% CIs for the difference in change over each interval were calculated as the covariate-adjusted difference in the mean of the change between HD participants and controls, divided by the estimated residual SD of change in HD participants. Expression of results as (unit-free) ES permits comparison of changes measured using different metrics. The square of ES is inversely related to sample-size requirements for clinical trials under the assumption that a 100% effective treatment will reduce the mean rate of change in HD cases to that in healthy controls without affecting the variability in these rates.22 Ninety-five per cent CIs for the ES were calculated using bias corrected and accelerated (BCa) bootstrapping, with 2000 replications.23 An ES of two implies that the mean change in HD is two SD away from that in controls. No formal criteria were used to assess ‘size’ of ES. Since thresholds for such criteria could be argued to be arbitrary, the approach taken was to consider ESs in relation to each other at each time point, and to evaluate whether the estimated ES and 95% CIs translated into feasible sample size estimates for the specific context of HD clinical trials. No adjustment for multiple comparisons was made since there is independent scientific interest in each of the variables.24 Throughout, a cut-off of p=0.05 was used to establish formal statistical significance, with the actual p values also considered in the interpretation of results. All analyses were performed in STATA V.12.
Results
Participants
At baseline, five HD participants were granted waivers for being outside disease stage 1; four were stage 2, one was stage 3.10 All controls and 59/61 HD participants returned for the 6-month assessment; HD non-attendance was due to illness (n=2), both returned for the 15-month visit. Thirty-seven of the 40 controls and 56 of the 61 HD participants returned for the 15-month assessment; HD dropout was due to disease-related burden (n=1), inability to tolerate scanning (n=1), treatment for cancer (n=1) and psychiatric burden resulting in the site investigator withdrawing the participant (n=2). Dropout in the control group was due to being the spouse of a withdrawn HD participant (n=1) or personal issues unrelated to the study (n=2).
Age and gender were well balanced between groups (table 1). Within the HD group, CAG, disease burden25 and TFC were well balanced between sites (see online supplemental table S1). The average intervals in months (mean (SD)) between assessments in the HD group were 5.76 (1.36), 9.12 (0.99) and 14.88 (1.33). In the control group, the intervals were 5.48 (1.08), 9.08 (0.88) and 14.50 (1.09).
Effect sizes
ESs for the difference in 6-month, 9-month and 15-month change between HD participants and controls are presented in table 2. Unadjusted baseline, 6-month and 15-month findings for each outcome, with the number of data points for each variable, are presented by group in online supplemental tables S3 and S4, with adjusted between-group differences in change over the 6-month, 9-month and 15-month intervals.
For clinical applicability, table 2 should be viewed in conjunction with figure 1, which depicts the relationship between ES and sample-size requirements for disease-modifying clinical trials (where the outcome is a single change measured between two time points) for varying assumed treatment efficacies.
Macrostructural neuroimaging measures
Longitudinal atrophy of the caudate, white matter, grey matter and whole brain, and expansion of the lateral ventricles, produced relatively large ESs over 6-month, 9-month and 15-month intervals (table 2); with all between-group differences statistically significant (p<0.05, see online supplemental table S4). ESs for these metrics were relatively consistent in that they tended to change in magnitude relative to the interval size. Caudate atrophy and ventricular expansion performed particularly strongly over the 6-month interval.
Putamen atrophy ESs were small and not statistically significant over the 6-month interval (ES 0.101; 95% CI −0.187 to 0.397), but performed more strongly over 9 and 15-months, although ESs were smaller than for the caudate and the other more global atrophy metrics listed above (table 2).
Corpus callosal atrophy was not significantly higher in patients than controls for all time intervals examined (see online supplemental table S3).
Cortical thinning ESs were small and between-group differences were only statistically significant for the occipital cortex over the 15-month interval (p=0.032, see online supplemental table S3); however, this ES was relatively small with a wide CI (0.512; 95% CI 0.011 to 0.997).
Microstructural neuroimaging measures
The microstructural (diffusion) metrics had typically smaller ESs than the macrostructural atrophy measurements, although the caudate diffusivity metrics performed strongly (table 2, see online supplemental table S3). In particular, caudate MD produced ES comparable to caudate atrophy over the 9-month and 15-month intervals.
FA ESs were small and there was little evidence of statistically significant between-group differences for all structures examined (caudate, putamen, global white matter and corpus callosum), particularly over short intervals (see online supplemental table S3).
Clinical measures
The standard clinical scales examined (TFC and TMS) performed relatively poorly. Between-group differences in TFC were not statistically significant over 6-month, 9-month or 15-month intervals (see online supplemental table S3) and corresponding ESs were small, with CIs spanning zero. TMS performed more strongly than TFC over the 9-month and 15-month intervals, with significant between-group differences and larger ES, although the CIs surrounding the ES estimates were wide (TMS over 15 months; ES 0.545 (95% CI 0.075 to 1.123)).
Cognitive measures
Changes in the majority of tasks in the cognitive battery did not differ significantly between HD and controls over all intervals examined (table 2, see online supplemental table S3). The Symbol Digit Modality Task (SDMT) was the most promising non-imaging measure with an ES of 0.799 (95% CI 0.344 to 1.254) over 15 months.
Discussion
Employing a multisite study design with variable, short-interval observational periods, we report 6-month, 9-month and 15-month ESs for a range of candidate biomarker outcomes for HD trials across multiple assessment modalities (macrostructural and microstructural neuroimaging, clinical and cognitive). Reported ESs can be used with a standard formula to calculate sample-size requirements for disease-modifying clinical trials22 (figure 1). This is the first time that ESs have been reported over the short intervals of 6 and 9 months. It is hoped that these data will be used to directly inform disease-modifying clinical trial design.
Key results
Longitudinal changes in macrostructural neuroimaging measures such as caudate atrophy and ventricular expansion in early HD participants were larger than those in controls, giving rise to consistently large ESs over the 6-month, 9-month and 15-month intervals, in agreement with previous multisite observational findings over periods of 12 months and longer.4 ,5 ,7 Analogous ESs for cortical metrics were smaller, particularly over the shorter intervals. Although cortical thinning was recently used as an outcome measure in the PRECREST trial over a 6-month interval,26 our findings suggest it has limited longitudinal sensitivity and would require substantially larger sample sizes than the other macrostructural metrics reported here. Microstructural (diffusion) neuroimaging metrics ESs were also typically smaller over the shorter intervals, although caudate diffusivity metrics performed strongly over 9 and 15 months, in line with the most promising atrophy measures. To our knowledge, this is the first longitudinal multisite study to examine change in diffusion metrics in HD. Findings are encouraging, particularly within the striatal grey matter, in accordance with a recent report over 18 months in a single-site study.3
Clinical and cognitive outcomes exhibited small longitudinal ESs, particularly over 6-month and 9-month intervals, with wide CIs, indicating a lack of precision. Of note, SDMT appeared particularly promising over the 6-month interval, producing ESs comparable with caudate atrophy, although with noticeably wider CIs. However, this result was not replicated over the 9-month interval, suggesting it to be a chance finding. Over 15 months, SDMT performed strongly, producing ESs comparable with putamen atrophy. These longer interval findings are in line with previous reports over 12 and 24 months, showing SDMT to be one of the most promising cognitive outcomes.4 ,5 ,8 ,27
Interpretation: clinical application
To interpret findings within the context of designing disease-modifying clinical trials in HD, we must consider that although certain neuroimaging measures appear to be particularly powerful, they would not be accepted as primary end points in trials since they do not provide a direct measure of how the patient feels, functions or survives (http://www.fda.gov). Hence, to exploit the potential of these neuroimaging measures, we propose their use: (1) as initial short-term readouts in early phase/proof-of-concept (PoC) studies over 6 or 9 months; (2) as interim or safety readouts over 6 or 9 months in longer, larger efficacy studies (eg, phase III); and (3) as secondary end points in efficacy studies over longer periods such as 15 months.
Short-term readouts
Macrostructural neuroimaging measures such as caudate atrophy and ventricular expansion may be able to provide early confidence-instilling readouts in phase II PoC studies over intervals such as 6 and 9 months, where the goal would be to assure safety and gather initial evidence that the therapy had promising properties. Encouraging findings from such readouts would facilitate the decision whether to further invest in the therapy, increasing participant numbers and trial duration. An adaptive approach such as this, based on early, meaningful data, could improve the viability of disease-modifying clinical trials in HD.
Interim readouts and secondary end points
Once sufficiently powered, disease modification could be demonstrated in large-scale phase II/III efficacy studies of longer duration such as 15 months, using approved clinical measures such as TMS as the primary end point, and specific neuroimaging metrics as secondary end points. Supportive data from a strong neuroimaging biomarker programme would be important in demonstrating disease modification.
Figure 2 provides an example of how the ES data presented in table 2 could be used to inform clinical trial design. Sample-size requirements are presented for the most promising outcomes from each assessment modality (table 2), based on a treatment hypothesised to reduce the rate of change in each outcome by 50% (90% power and 5% significance level). Based on these results, recommendations for selecting biomarkers for short PoC studies and longer term phase III trials are provided as ‘ticks’ (show potential), ‘crosses’ (unlikely to be suitable) and ‘question marks’ (further data are required due to wide CIs). An important caveat of this figure is that sample sizes are heavily dependent on the magnitude of the hypothesised treatment effect (figure 1). For example, requirements would be four times larger if the effect was reduced to 25%. Nevertheless, this approach does provide an estimate of sample-size requirements to sufficiently power trials, as well as a means of comparing the outcomes across assessment modalities.
For example, in order to detect therapeutic effects on ventricular expansion following treatment periods of 6, 9 or 15 months, sample-size requirements per treatment arm would be 134 (95% CI 64 to 495), 98 (95% CI 51 to 275) and 80 (95% CI 48 to 186), respectively, for 50% efficacy. Considering the magnitude of the sample sizes and the width of the CIs, ventricular expansion may be a suitable biomarker for use in short-term PoC studies, as well as trials over a longer duration (figure 2).
Conversely, to assess the effect of a therapy on motor progression, the commonly applied UHDRS-TMS may be suitable for use over 9-month and 15-month intervals, given a 50% treatment effect; however, the wide CIs around these sample sizes indicate a lack of precision (figure 2).
Generalisability
It is important to note that observational data should only be used to inform clinical trials involving similar cohorts and observational periods. The current study focused predominately on stage 1 HD, the very early clinical phase of the disease, since disease-modifying treatments are most likely to be efficacious in preserving function and quality of life when administered at this point. Therapies shown to be effective in these cohorts within an acceptable safety profile may be administered during the premanifest stages of the disease, prior to clinical onset. The observational PREDICT-HD study, which focuses on the premanifest stages of the disease, is ideally positioned to inform the design of such trials.8
Limitations
We must acknowledge the potential limitations of using neuroimaging biomarkers as efficacy readouts. It is possible that a positive macrostructural neuroimaging readout over 6 or 9 months may not be indicative of longer term clinical or functional improvement. Although associations between change in neuroimaging measures and functional decline have been reported in HD, causality is yet to be demonstrated.4 ,7 Furthermore, these readouts may not be suitable for all types of intervention; their utility may be dependent on the mechanism-of-action of the therapy, together with the time required for it to mediate an effect. Nevertheless, these neuroimaging measures are able to track the progression of pathological atrophy over short time intervals, reproducible across multiple sites and objective. They may provide valuable biomarkers in the assessment of disease-modifying compounds. Another limitation includes the decision to focus on the corpus callosum as a whole, when there is evidence that each subregion of the corpus callosum projects to distinct cortical regions and is likely to be differentially implicated in the disease process. Future work should investigate these substructures independently, and whether the added complexity of delineating smaller with less well-defined regions is offset by a stronger atrophy signal.
None of the participants in the current study were enrolled in clinical trials; however, many were on medications that target the central nervous system (CNS; see online supplemental table S2). Mean dosages of CNS-targeting drugs were relatively low, with overlap in usage between groups. This study was not designed to examine the specific effects of medication on each outcome; however, we acknowledge medication usage as a potential confounder.
Conclusion
The short-interval observational data presented here are complimentary to findings over longer intervals in others such as the TRACK-HD and the PREDICT-HD studies. Taken together, these studies can provide data to directly inform the design of clinical trials in HD, facilitating the evaluation of treatments designed to slow the course of this devastating disease. Since HD is often regarded as a model neurodegenerative disease, amenable to early intervention,1 research into this disorder may inform early-intervention strategies for more prevalent neurodegenerative diseases.
Acknowledgments
The authors would like to thank the patients and controls who took part in this study, along with all the PADDINGTON study Work Package 2 site staff at Paris, Leiden, Ulm and London, and Eileanoir Johnson for her assistance with the medication table. This work has been supported by the European Union—PADDINGTON project, contract no HEALTH-F2-2010-261358. RIS is supported by the CHDI/High Q Foundation, a not for profit organisation dedicated to finding treatments for Huntington's disease. This work was undertaken at UCLH/UCL, which received a proportion of funding from the Department of Health's NIHR Biomedical Research Centres funding scheme. SJT acknowledges support of the National Institute for Health Research through the Dementias and Neurodegenerative Research Network, DeNDRoN.
PADDINGTON Work Package 2 contributors: The Netherlands—Dr Ellen ‘t Hart, MSc; Verena Rödig, MSc; Anne Schoonderbeek, MSc (Leiden University of Medical Sciences). The UK—Victoria Perry, BSc; Nicola Robertson, BSc (UCL Institute of Neurology, London). France—Dr Perrine Charles, MD PhD; Dr Claire Ewenczyk, MD; Dr Stephan Klebe, MD (Assistance Publique-Hôpitaux de Paris, Paris); Dr Damien Justo, PhD (Université Pierre et Marie Curie, Paris). Germany—Sabrina Betz; Dr Jens Dreyhaupt, PhD; Carolin Eschenbach; Ms Jeton Iseni; Daniela Schwenk; Dr Michael Orth, Associate Professor MD; Sonja Trautmann, Nurse; Ms Karin Schiefele; Irina Blankin, BSc; Ms Theresia Kelm, BSc; Rosine Scherer Diploma; Felix Mudoh Tita, MSc; Katja Vitkin (Ulm University). Italy—Dr Giovanna Tripepi, PhD; Dr Giuseppe Pollio, PhD.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online supplement
Footnotes
Contributors NZH contributed to the project design, data collection, image processing, interpretation of statistical analysis and drafting of the manuscript. REF contributed to the project design, execution and interpretation of statistical analysis and drafting of the manuscript. EMR contributed to data collection, image analysis and manuscript drafting and review. JHC contributed to the project design, image analysis and review of the manuscript. SH contributed to data collection and review of the manuscript. IBM and HJ contributed to image analysis and review of the manuscript. H-PM, SDS, RACR and AD contributed to project design, data collection and review of the manuscript. CF contributed to conceptualisation and design of the project, execution and interpretation of statistical analysis, and drafting and review of the manuscript. RIS contributed to the project design and review of the manuscript. BL contributed to the conceptualisation and design of the project, and review of the manuscript. SJT contributed to the conceptualisation and design of the project, and review of the manuscript, and is guarantor.
Competing interests None.
Ethics approval Central London REC 4.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The authors and all partners in the PADDINGTON project are committed to data sharing to further the field of HD research. Baseline data have been published previously and this paper provides the analysis of the final two time points from this study, which finished data collection in 2013. Although this manuscript represents analysis of the whole data set, it is possible that other research teams may wish to utilise this data set to answer further questions using alternative techniques. Data would be released to them on submission of a detailed research proposal and approval by the PADDINGTON Steering Committee.