Article Text

Research paper
Evaluation of longitudinal 12 and 24 month cognitive outcomes in premanifest and early Huntington's disease
  1. Julie C Stout1,
  2. Rebecca Jones2,
  3. Izelle Labuschagne1,
  4. Alison M O'Regan1,
  5. Miranda J Say3,
  6. Eve M Dumas4,
  7. Sarah Queller1,5,
  8. Damian Justo6,
  9. Rachelle Dar Santos7,
  10. Allison Coleman7,
  11. Ellen P Hart4,
  12. Alexandra Dürr6,
  13. Blair R Leavitt7,
  14. Raymund A Roos4,
  15. Doug R Langbehn8,
  16. Sarah J Tabrizi3,
  17. Chris Frost2
  1. 1School of Psychology and Psychiatry, Monash University, Melbourne, Victoria, Australia
  2. 2Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
  3. 3UCL Institute of Neurology, University College London, London, UK
  4. 4Department of Neurology, Leiden University Medical Centre, The Netherlands
  5. 5Queller Consulting, Dunedin, Florida, USA
  6. 6Department of Genetics and Cytogenetics, and INSERM UMR S679, APHP Hôpital de la Salpêtrière, Paris, France
  7. 7Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
  8. 8Departments of Psychiatry and Biostatistics (Secondary), University of Iowa, Iowa City, Iowa, USA
  1. Correspondence to Professor J C Stout, School of Psychology and Psychiatry, Monash University, Wellington Road, Melbourne, Victoria 3800, Australia; julie.stout{at}


Background Deterioration of cognitive functioning is a debilitating symptom in many neurodegenerative diseases, such as Huntington's disease (HD). To date, there are no effective treatments for the cognitive problems associated with HD. Cognitive assessment outcomes will have a central role in the efforts to develop treatments to delay onset or slow the progression of the disease. The TRACK-HD study was designed to build a rational basis for the selection of cognitive outcomes for HD clinical trials.

Methods There were a total of 349 participants, including controls (n=116), premanifest HD (n=117) and early HD (n=116). A standardised cognitive assessment battery (including nine cognitive tests comprising 12 outcome measures) was administered at baseline, and at 12 and 24 months, and consisted of a combination of paper and pencil and computerised tasks selected to be sensitive to cortical-striatal damage or HD. Each cognitive outcome was analysed separately using a generalised least squares regression model. Results are expressed as effect sizes to permit comparisons between tasks.

Results 10 of the 12 cognitive outcomes showed evidence of deterioration in the early HD group, relative to controls, over 24 months, with greatest sensitivity in Symbol Digit, Circle Tracing direct and indirect, and Stroop word reading. In contrast, there was very little evidence of deterioration in the premanifest HD group relative to controls.

Conclusions The findings describe tests that are sensitive to longitudinal cognitive change in HD and elucidate important considerations for selecting cognitive outcomes for clinical trials of compounds aimed at ameliorating cognitive decline in HD.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Cognitive decline is a serious debilitating symptom in neurodegenerative diseases, resulting in untold suffering and huge financial costs. Thus treatments for cognitive decline are urgently needed. These potential treatments fall into two broad categories: (a) disease modifying treatments, which are aimed at changing the neuropathological progression (eg, halting, slowing); and (b) symptom focused treatments, which are aimed at enhancing the function of compromised neural systems. Although symptom focused treatments, such as the use of cholinesterase inhibitors in Alzheimer's and other diseases, have met with a moderate degree of success, there are, as of yet, no disease modifying treatments for any neurodegenerative disease.

Huntington's disease (HD) is a fully penetrant, autosomal dominant neurodegenerative disease. Unlike Alzheimer's disease or Parkinson's disease, for which the genetic risk factors are far less predictive, it is possible to know with certainty who will develop HD far in advance of the symptoms and signs of disease. As such, HD has emerged from the neurodegenerative diseases as a potential opportunity for the development of the first disease modifying intervention strategies. People who have the HD CAG expansion usually start life functioning normally and then begin to gradually develop involuntary movements, psychiatric symptoms and cognitive decline, eventually leading to death typically 15–20 years following diagnosis.1 ,2 As potential interventional compounds are identified, it will theoretically be possible to identify people at risk who can be treated preventatively, in the premanifest period, to impede the development of disease signs and symptoms. However, because it is essential to be able to test intervention strategies in trials of reasonably limited duration, the slowness with which HD progresses in the premanifest period is prohibitive, and instead it will be necessary to test for drug effects in already diagnosed patients when progression may be rapid enough to get efficient readouts from clinical trials.

For any disease with progressive cognitive decline, success in finding treatments to prevent or slow cognitive deterioration rests on the availability of cognitive outcomes that are tolerable in the clinical trial setting and are responsive to treatment. Generally, cost considerations mean that clinical trial duration is limited to 1 or 2 years at most, and sample sizes must be in the low hundreds rather than in the thousands. Thus clinical trials for cognitive interventions are fully reliant on the availability of cognitive outcome measures that can reveal change within this interval. At this time, there is no currently accepted battery of cognitive tests—that is, ready for clinical trials—in either diagnosed or premanifest HD.

This is the first report to examine longitudinal 12 and 24 month progression in late premanifest and early HD compared with controls, with respect to feasibility and methodology for these cognitive measures for HD trials, although a subset of the 12 and 24 month data was previously reported.3 ,4 The cognitive battery was administered in the context of TRACK-HD, a multisite, observational, longitudinal study aimed at identifying biological and clinical markers in premanifest and early HD individuals, across domains of cognition, psychiatry, quantitative motor and neuroimaging. The aims of the analyses reported here were to: (a) determine whether progression of cognitive decline could be detected at 12 or 24 month intervals (to approximate feasible timelines of future clinical trials); (b) quantify the effect sizes (ES) for rates of change in cognition in order to facilitate power calculations for future trials; and (c) determine whether particular cognitive measures show statistically significant superiority over other measures in the ability to detect change over time. For future clinical trials, cognitive measures that require the smallest sample sizes for any chosen treatment effect will be those with the largest ES. For this reason, and also to better understand how the ubiquitous practice effects present in cognitive assessment are exhibited in premanifest diagnosed groups compared with disease-free participants, we included a disease-free comparison group in our analyses. For the purposes of sample size calculations, we consider a 100% effective treatment to be one where the mean change in a treated group is the same as that in the disease-free group.



Briefly, participants were recruited from four sites, including Vancouver, Paris, Leiden and London, as part of the TRACK-HD study.5 Participants were 18–65 years old, able to tolerate and safely undergo magnetic resonance imaging (MRI), were not participants in a clinical drug trial and were free of concomitant other major neurological, psychiatric or medical illnesses (including significant head injury, drug/alcohol abuse). Inclusion in the premanifest group was defined at study entry by a disease burden score of >2506 and a total motor score ≤5, as assessed by the motor examination of the United Huntington's Disease Rating Scale (UHDRS-99).7 The early HD group included individuals at stages 1 or 2 according to the UHDRS Total Functional Capacity score at the baseline assessment. Controls were primarily spouses or partners and gene negative siblings to maximise consistency of environments. Where possible, groups were frequency matched (ie, having similar distributions) on age, sex and education; as expected, given the progressive nature of HD, the early HD group was slightly older than the premanifest and control groups (see table 1). A total of 366 participants were enrolled at baseline. Here we report on a total of 349 participants (see supplement for more detail, available online only).

Table 1

Summary of participant characteristics

Cognitive assessment

Table 2 provides a list of the cognitive tasks and the variables analysed for this study, with details of the cognitive methods for each test presented in the supplement (available online only). Briefly, examiners were trained in person by the first author for standardised test administration of a set of paper and pencil and computerised tasks, and then they tested participants in the language spoken locally at each site (French, Dutch and English) as part of an annual TRACK-HD visit. Here we report on nine tests (12 primary outcomes) that were administered at all three visits (0, 12 and 24 months).

Table 2

Cognitive test battery information

Statistical methods

Cognitive outcomes were analysed separately using a generalised least squares regression model for repeated measures of the outcome at baseline, and at 12 and 24 months (additional details in the supplement, available online only). For a given outcome, participants were excluded from data analysis if they had data at only one of the three visits. ES for differences in the rate of change observed over both 12 and 24 months for each task were calculated as the estimated difference in longitudinal change in each disease group relative to controls, divided by the residual SD of change in the disease group. To compare ES magnitudes between the 12 cognitive outcomes, we calculated differences between ES for each pair of tasks for both the 12 and 24 month change. We estimated 95% CIs for the ES and pairwise ES differences using the bias corrected and accelerated bootstrap method with 2000 replications.9 All analyses were performed using SAS V.9.2. (Stata Corporation).


All 12 of the cognitive outcomes showed evidence of deterioration in the early HD group, relative to controls, over 24 months. Differences were statistically significant (p<0.05) for all measures except Trails B and 1.8 Hz Paced Tapping, which were borderline statistically significant (0.05<p<0.1). In contrast, very little evidence of decline was detectable in the premanifest group. Table 3 presents the unadjusted means at baseline, and at 12 and 24 months, and table 4 displays the adjusted means between group differences in longitudinal change.

Table 3

Summary of performance on cognitive assessments at baseline, and after 12 months and 24 months of follow-up

Table 4

Between group differences in annualised rate of longitudinal change adjusted for age, sex, centre and education

Despite the consistent evidence of deterioration in the early HD group, the way this deterioration was expressed varied. For example, in some tasks, the early HD group showed a decline in cognitive performance at subsequent visits whereas the control group showed improvements (ie, practice effects), resulting in large longitudinal differences between groups in rates of change. The Symbol Digit Modalities Test (SDMT) showed this pattern; in early HD there was a decline from 33.9 at baseline to 31.0 at 24 months compared with controls who improved from 52.4 at baseline to 54.4 at 24 months. After adjustment for demographic factors, the early HD group relative to controls declined by 2.63 points (95% CI 1.91 to 3.34) per year over 24 months. A similar pattern was observed on the Stroop Test Word Reading condition, with the early HD group declining 4.21 points (95% CI 2.78 to 5.65) per year more than controls over 24 months. On other tests, such as the Circle Tracing indirect condition, both controls and the early HD group exhibited practice effects but this effect was markedly greater in controls, indicating a relative deterioration in the early HD group. For example, for Circle Tracing indirect, controls improved from 5.59 at baseline to 6.16 at 24 months whereas the early HD group improved only from 5.14 at baseline to 5.38 at 24 months. Circle Tracing direct showed a similar pattern. Finally, in some tasks, such as the the University of Pennsylvania Smell Identification Test (UPSIT), the early HD group declined while the control group's performance stayed stable; controls scored 17.16 at baseline and 17.13 on average at 24 months whereas early HD scored 13.51 at baseline and 12.59 at 24 months. After adjustment, performance of the early HD group compared with controls decreased by 0.52 points (95% CI 0.21 to 0.83; p=0.001) per year over 24 months.

The strongest and most consistent evidence of differences in longitudinal rates of change in the early HD group compared with controls, as indicated by large standardised ES, were in three outcomes which showed significant effects at 12 and 24 months (all p's<0.0005). As an illustration of these, 24 month ES for differences from controls were SDMT=1.00 (95% CI 0.70 to 1.30), Circle Tracing Indirect=0.85 (95% CI 0.58 to 1.18) and Stroop Word Reading=0.73 (95% CI 0.48 to 1.03). In contrast, for other cognitive outcome measures, such as Negative Emotion Recognition and Spot the Change (visual working memory), we observed strong evidence only at 24 months (emotion ES: 0.49; 95% CI 0.21 to 0.77; p=0.0003; spot ES: 0.40; 95% CI 0.16 to 0.68; p=0.0025) whereas at 12 months evidence of faster rates of decline in early HD was only weak (emotion ES: 0.27; 95% CI 0.03 to 0.52; p=0.034; spot ES: 0.23; 95% CI −0.03 to 0.46; p=0.070). Finally, some tasks, including 1.8 Hz Paced Tapping and Trails B, showed no statistically significant deterioration over 12 months (p>0.50) and only weak evidence of decline over 24 months (1.8 Hz Tapping ES: 0.32; 95% CI −0.02 to 0.74; p=0.070; Trails B ES: 0.19; 95% CI 0.00 to 0.39; p=0.067). See table 5 for full details of ES for all outcomes.

Table 5

Standardised effect sizes of between group differences in change adjusted for age, sex, centre and education

In contrast with the clear evidence of decline in early HD, we found very little evidence of measurable deterioration in the premanifest group relative to controls over either 12 or 24 months. The strongest suggestions of longitudinal decline in the premanifest group came from the Circle Tracing indirect condition and SDMT, with ES of 0.23 (95% CI −0.05 to 0.51) and 0.20 (95% CI −0.03 to 0.43), respectively, over 12 months and 0.19 (95% CI −0.10 to 0.48) and 0.14 (95% CI −0.11 to 0.38) over 24 months. None of these longitudinal effects reached the statistical significance threshold of p<0.05.

To facilitate more robust comparisons between tasks, we examined whether some tasks were statistically superior to other tasks in detecting longitudinal changes. Results of these analyses indicated that, whereas in absolute terms the SDMT had larger ES at both 12 and 24 months compared with other cognitive tasks, the SDMT ES were not statistically significantly larger than many other tests. More specifically, SDMT was not significantly better at detecting longitudinal differences between early HD patients and controls than the Circle Tracing indirect condition, Stroop Word Reading or 3 Hz Paced Tapping, for either the 12 or 24 month time periods. Neither was there any evidence that Trails B, the task with the smallest absolute ES, was significantly worse than Negative Emotion Recognition, Spot the Change, Trails A or Paced Tapping at either 1.8 or 3 Hz. We were thus unable to distinguish either a single ‘best’ or a ‘worst’ performing test within the cognitive battery on the basis of ES differences. See table 6 for full results of comparisons of ES between outcomes.

Table 6

Differences in standardised effect sizes of between group differences in longitudinal change over 12 and 24 months for pairs of variables adjusted for age, sex, centre and education

An important caveat for reconciling the results presented here with our previous reports on Circle Tracing tasks at the 12 month time point is that in the current analyses we have taken care to avoid inflation of longitudinal ES that arise due to a combination of large baseline differences between groups and an association between change and baseline performance. Specifically, because changes tend to be smaller in cases with lower baseline levels (ie, HD cases), it is implausible that even a 100% effective treatment will render the mean change in outcome in the HD group to be as great as that in the control group, and hence the ES will be unrealistically large for the purposes of estimating samples sizes for clinical trials. For this reason, we logarithmically transformed the Circle Tracing data as this removed any dependency of change on baseline, as assessed by testing for associations between change and mean levels.10


In this study, we found highly consistent evidence that longitudinal cognitive decline is detectable across a 24 month interval in early HD. Changes in 10 of the 12 cognitive outcome measures, which were derived from nine distinct cognitive tests, were statistically significant compared with controls, with medium to large ES. About half of the cognitive measures also showed statistically significant (small to medium) effects after only 12 months of follow-up. In contrast with the early HD findings, we did not detect statistically significant longitudinal decline at either 12 or 24 months in the premanifest sample relative to change in controls. Because we studied sample sizes and an overall duration of follow-up relevant to clinical trials, as well as including both premanifest and early HD participants in the study, the ES results from this study are useful for clinical trial planning in HD. Thus these results provide ample cognitive outcomes sensitive at 12 or 24 months in early HD, indicating that it is now possible to conduct treatment trials aimed at slowing cognitive deterioration in early stage patients.

In contrast with the findings in early HD, our results indicate that for premanifest HD, rates of progression of these cognitive outcomes appear to be too slow to detect with a reasonable sample size in a time period reasonable for a clinical trial. Importantly, the lack of significant findings in premanifest HD does not mean an absence of progressive cognitive decline throughout the premanifest HD period. Rather, it seems more likely that the magnitude of this decline is too small and/or the rate of progression is too slow to be detected over 24 months in a premanifest sample of 117 individuals. This is important to note given that this premanifest sample had reasonably high levels of disease burden (mean=293.8), which yielded a median estimate of 10.8 years to onset. However, the sample also did not have significant motor signs indicative of HD at the time of study entry. This sample was designed to be a relatively pure sample which was unequivocally premanifest at the start of the study despite the disease burden scores indicating that they were in the latter premanifest stages. We anticipate that a premanifest sample that was closer to estimated disease onset or displaying significant motor signs could be expected to show greater degrees of cognitive change and that perhaps such changes would be detectable in a 24 month interval in a sample of about 120 participants. Indeed, we did find evidence for this in a partial examination of the cognitive battery within a smaller subsample of the TRACK-HD cohort.4 A test battery with a higher level of difficulty, designed specifically to challenge cognition in the premanifest period, might also be more likely to reveal decline over time. The Predict-HD cohort is also of great interest with regard to understanding the progression of cognitive change in the premanifest period in relation to disease burden and motor signs, and hopefully a longitudinal report of these data will be made available in the near future. Regardless, our findings suggest the plausibility of clinical trials for cognition in premanifest HD, and they highlight important issues for consideration of sample selection for such a trial.

This study makes several important contributions that will facilitate clinical trials to ameliorate cognitive decline in HD. First, to our knowledge, this is the only study of longitudinal cognitive assessment involving a battery of cognitive tests that has reported on both premanifest and HD groups, thus providing unique evidence of the relative sensitivity of tests to each other and across these stages of progression. Second, there are few longitudinal reports in premanifest or diagnosed HD, and of these, none has as extensive a cognitive battery or as many participants or participant groups as TRACK-HD.11–15 Further, previous longitudinal cognitive studies used sample sizes too small to detect anything but large effects (n<25), and/or batteries were strictly limited to one or to only a couple of cognitive tests. Finally, this study highlights the observation of differential practice effects across the groups as evidence of cognitive decline. Thus this report makes available, for the first time, a description of changes in cognition across a wide range of cognitive domains known to be affected in HD, across both premanifest and early HD, and across two annual follow-up time points.

Clinical trialists, because of the time restrictions they face in collection of data for clinical trials, must evaluate the relative sensitivity of outcome measures to select what they believe are the most sensitive measures. Provided that a putative treatment has the same proportionate effect on changes in all potential outcome measures (over and above the changes in healthy controls), outcome measures can be selected by comparing ES across measures. However, the fact that one ES is larger than another does not guarantee that the difference in the two ES is statistically significant, even if one ES is itself statistically significant and the other is not. For this reason, we coupled the construction of league tables of ES with pairwise comparisons to establish where there is evidence that particular ES are superior to others. Such an approach has significant benefits in the context of clinical trial planning because it provides an empirical basis with which to prioritise tests for inclusion in a clinical trial battery. The results showed us that there were no clear ‘best’ or ‘worst’ tests, and that instead, despite some differences in the magnitudes of the ES, many of the ES for the cognitive outcomes were not statistically significantly different from one another. For example, for both 12 and 24 month intervals, the SDMT had the largest ES in absolute terms. However, neither the 12 nor 24 month ES for SDMT was statistically significantly larger than the estimates for Stroop Word Reading, the indirect condition of Circle Tracing or the 3 Hz condition of Paced Tapping. Trails B had the smallest ES, but this ES was not significantly smaller than those for Negative Emotion Recognition, Spot the Change, Trails A or either of the Paced Tapping conditions.

Cognitive tests with the highest ES are likely to be the most statistically significant in a clinical trial of a disease modifying therapy, provided that such a therapy has a similar proportional effect on each test—that is, a drug that reduces the rate of decline in one test by 50% will also reduce the rate of decline in other tests by 50%. Of course, a more statistically significant effect does not necessarily translate into a more clinically significant effect but in the absence of information about which of the cognitive tasks considered here are most clinically important, this seems a reasonable criterion on which to base the selection of outcome variables for clinical trials.

A composite cognitive score may yield larger ES than those from individual cognitive tests but at present there is no well recognised cognitive combination that is used in practice. A number of statistical and non-statistical approaches could be used to derive such a score but there can be no certainty that a combination of cognitive outcomes with an increased ES will necessarily translate into a more statistically efficient outcome for clinical trials. Specifically, if a treatment has non-proportional effects on the various test scores that make up a composite, then that composite may be less efficient than a composite score which emphasises the more responsive of the individual tests.

A clear understanding of where statistically significant differences in ES are and are not present also has implications for power analyses. For example, for the three tests with the largest ES at 12 months for early HD (SDMT, indirect Circle Tracing and Stroop Word Reading), sample size estimates for a 50% effective treatment, 90% power and two tailed p<0.05 group comparisons would be 150 (95% CI 75 to 374), 289 (95% CI 135 to 934) and 337 (95% CI 135 to 875), respectively, in each arm of a 1 year treatment trial with no dropouts. The results suggest that estimating sample sizes across the range of ES for equally best outcomes (in this case SDMT, indirect Circle Tracing and Stroop Word Reading) would help to avoid underestimating the sample needed. Given that ES are reduced by low reliability, and that cognitive outcomes tend to be relatively noisy measures, the findings also highlight the need to minimise noise wherever possible in measuring cognition. Thus careful control over standardised test administration and scoring is essential, as is minimisation of participant related variability linked to such factors as fatigue.

Due to the paucity of longitudinal studies, researchers must frequently utilise cross sectional results for selecting the most promising outcome measures. Yet cross sectional comparisons of participants stratified along a continuum of progression may lead to gross overestimates of longitudinal effects over short time periods, such as those seen in clinical trials. For example, we previously reported cross sectional TRACK-HD findings5 for three cognitive outcomes that revealed significant group effects even though longitudinally we now show these measures to be among the least sensitive. Similarly, results from the very large cohort of premanifest HD participants in Predict-HD, which uses a set of cognitive measures that overlap with TRACK-HD, also show cross sectional sensitivity.16 Thus, wherever possible, ES for estimating samples sizes for clinical trials should be based on longitudinal observations, such as those reported in this paper.

Importantly, in diseases that affect cognition, ES estimates for rates of change conflate practice effects and deterioration. The possible impacts of this conflation must be carefully considered before using change rates to determine clinical trial sample sizes. In designing future studies or trials, attention should be given to using multiple baseline designs to help disentangle the contribution of practice to the observable changes from deterioration or treatment. ES from many tests also conflate motor deterioration and cognitive deterioration although the battery of tests we report here includes tests that can be argued specifically to be free of such confounds. Specifically, Spot the Change, Emotion Recognition and the UPSIT do not require rapid responding or precise movements, nor are their outcomes measured in terms of response speed. Thus deteriorations in performance in these tests, which were statistically significant in early HD, can be plausibly interpreted as indicating cognitive, but not motor, decline. When using the ES from cognitive batteries to determine power for clinical trials, it is important to keep in mind the potential interplay of cognition and motor function in order to select tests that are most suitable for the goals of a particular trial.

In conclusion, the findings from this study illustrate several considerations that are of general importance for designing cognitive outcome batteries for clinical trials, including the length of the follow-up needed, sensitivity of cognitive measures and the need to make careful assessments of whether ES are statistically different from each other. It also illustrates the limitations of using cross sectional findings to inform longitudinal designs.


The authors offer their gratitude to the volunteers who participated and to their carers and companions who helped make their participation possible.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Funding TRACK-HD is supported by the CHDI/High Q Foundation Inc, a not for profit organisation dedicated to finding treatments for Huntington's disease.

  • Competing interests None.

  • Ethics approval Ethics approval was provided by University College London, Monash University, University of British Columbia, Leiden University Medical Centre and Hôpital de la Pitié-Salpêtrière.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement TRACK-HD is not an open access study, but CHDI and the study investigators are committed to ensuring that TRACK-HD data are used to define and validate the most promising endpoints for clinical trials in Huntington's disease using the best analytical, state of the art approaches available. The study includes 36 month longitudinal 3T MRI, clinical, cognitive, quantitative motor, oculomotor and neuropsychiatric measures, and a standardised plasma collection protocol. Requests for access to unpublished data should be sent to the study coordinators (coordination{at} and will be considered by the Study Steering Committee on a case by case basis.