rss
J Neurol Neurosurg Psychiatry 76:1348-1354 doi:10.1136/jnnp.2004.047720
  • Paper

Do MCI criteria in drug trials accurately identify subjects with predementia Alzheimer’s disease?

  1. P J Visser1,
  2. P Scheltens2,
  3. F R J Verhey1
  1. 1Department of Psychiatry, University Hospital Maastricht and Alzheimer Centre Limburg, Maastricht, Netherlands
  2. 2Department of Neurology, Alzheimer Centre, VU Medisch Centrum, Amsterdam, Netherlands
  1. Correspondence to:
 Dr Pieter Jelle Visser
 Department of Psychiatry and Neuropsychology, University of Maastricht, PO Box 616, 6200 MD Maastricht, Netherlands; pj.vissernp.unimaas.nl
  • Received 4 July 2004
  • Accepted 30 January 2005
  • Revised 29 October 2004

Abstract

Background: Drugs effective in Alzheimer-type dementia have been tested in subjects with mild cognitive impairment (MCI) because these are supposed to have Alzheimer’s disease in the predementia stage.

Objectives: To investigate whether MCI criteria used in these drug trials can accurately diagnose subjects with predementia Alzheimer’s disease.

Methods: MCI criteria of the Gal-Int 11 study, InDDEx study, ADCS memory impairment study, ampakine CX 516 study, piracetam study, and Merck rofecoxib study were applied retrospectively in a cohort of 150 non-demented subjects from a memory clinic. Forty two had progressed to Alzheimer type dementia during a five year follow up period and were considered to have predementia Alzheimer’s disease at baseline. Outcome measures were the odds ratio, sensitivity, specificity, and positive and negative predictive value.

Results: The odds ratio of the MCI criteria for predementia Alzheimer’s disease varied between 0.84 and 11. Sensitivity varied between 0.46 and 0.83 and positive predictive value between 0.43 and 0.76. None of the criteria combined a high sensitivity with a high positive predictive value. Exclusion criteria for depression led to an increase in positive predictive value and specificity at the cost of sensitivity. In subjects older than 65 years the positive predictive value was higher than in younger subjects.

Conclusions: The diagnostic accuracy of MCI criteria used in trials for predementia Alzheimer’s disease is low to moderate. Their use may lead to inclusion of many patients who do not have predementia Alzheimer’s disease or to exclusion of many who do. Subjects with moderately severe depression should not be excluded from trials in order not to reduce the sensitivity.

Subjects with mild cognitive impairment (MCI) often suffer from Alzheimer’s disease in the predementia stage.1 For this reason, people with MCI have been enrolled in trials with drugs that were effective in patients with Alzheimer-type dementia. However, previous studies have shown that not all MCI subjects have predementia Alzheimer’s disease.2,3 In view of this well recognised heterogeneity of MCI, it is important to determine the extent to which drug trials have included subjects with MCI who indeed had predementia Alzheimer’s disease. Our aim in the present study was to investigate whether MCI criteria used in previous and ongoing drug trials could accurately diagnose subjects with predementia Alzheimer’s disease. We applied these criteria to a well characterised cohort of MCI patients whose cognitive outcome after five years of follow up was known and in whom the presence of predementia Alzheimer’s disease at baseline could be inferred. We also investigated how exclusion criteria for depression and vascular co-morbidity affected the diagnostic accuracy. MCI trials applied exclusion criteria for depression or vascular comorbidity because these conditions may cause MCI. However, depressive symptomatology and vascular comorbidity often occur in subjects with predementia Alzheimer’s disease, and exclusion of these may also decrease the diagnostic accuracy. Finally, because the prevalence of predementia Alzheimer’s disease increases with age we investigated whether the diagnostic accuracy was dependent on age.

METHODS

Subjects

The diagnostic accuracy of MCI definitions was investigated retrospectively in a cohort of newly referred consecutive non-demented patients from the Maastricht Memory Clinic with no apparent cause for their cognitive impairment such as cerebrovascular disorders, brain trauma, or severe psychiatric disorders.4 Subjects were referred to the memory clinic by general practitioners (70%), psychiatrists (16%), neurologists (6%), or other physicians (8%). Subjects were reinvestigated two, five, and 10 years after the baseline visit. For the present study we selected all subjects older than 50 years who had been eligible for the five year follow up assessment (n = 178). They were considered to have predementia Alzheimer’s disease at baseline if they met criteria for Alzheimer-type dementia during the five year follow up period. Outcome with respect to cognitive functioning was available for 150 subjects (82.3%). Of these, 42 (28%) had predementia Alzheimer’s disease at baseline. The group of subjects without predementia Alzheimer’s disease (n = 108) included one with frontotemporal dementia, one with dementia from other causes, three with vascular cognitive impairment, and one with Parkinson’s disease at follow up. Cognitive outcome was not available for 28 subjects because they had died (n = 5), could not be traced (n = 12), had refused follow up (n = 9), or were not contacted for other reasons (n = 2). Subjects without cognitive outcome were older than those with cognitive outcome (68.3 years v 61.8 years), while sex, years of education, and scores on the global deterioration scale (GDS),5 mini-mental state examination (MMSE),6 Hamilton depression rating scale (HDRS),7 and the delayed recall of the Rey auditory verbal learning task (RAVLT) did not differ.

Subjects gave informed consent before inclusion in the study. The study was approved by the medical ethics committee of the University Hospital Maastricht, Netherlands.

Clinical assessment and clinical diagnosis at baseline and follow up

At baseline subjects underwent a standardised assessment which included a history provided by the patient and a significant other; a psychiatric, neurological, and physical examination; clinical rating scales (MMSE,6 GDS,5 HDRS-17 items,7 the Blessed dementia rating scale (BDRS) part I,8 and the Hachinski ischaemic scale (HIS)9); appropriate laboratory tests; a neuropsychological assessment; and neuroimaging as described elsewhere.10 The diagnosis of dementia and Alzheimer’s disease was made according to the DSM-IV and NINCDS-ADRDA criteria.11,12 The follow up assessment consisted of a standardised questionnaire about medical history and cognitive complaints, the MMSE, the GDS, the HDRS, the BDRS, and a neuropsychological test protocol.13,14 If a subject declined to come for the follow up assessment, a telephone interview was conducted. This included a standardised questionnaire about medical history and cognitive complaints, and the Telephone Interview for Cognitive Status.15 The diagnosis of dementia and Alzheimer’s disease at follow up was made by a neuropsychiatrist and a neuropsychologist who were unaware of the results of the baseline assessment and who made their diagnosis independently of each other. If there was disagreement about the clinical diagnosis, a consensus meeting was held and if no agreement was reached the subject was considered not demented.

Definition of MCI according to criteria from trials

MCI trials were selected from the review by Petersen1 and included the galantamine international 11 (Gal-Int-11) study, the investigation into delay to diagnosis of Alzheimer’s disease with Exelon (InDDEx) study,16 the Alzheimer’s disease cooperative study–memory impairment study (ADCS-MIS),17 the ampakine CX 516 study,18 the piracetam study, and the Merck rofecoxib MCI study.19,20 Table 1 shows the eligibility criteria, inclusion criteria for MCI, and exclusion criteria for depression and vascular comorbidity of these studies. The criteria were applied retrospectively in the subjects from the Maastricht Memory Clinic. If scales and tests used in the trials had not been used in the memory clinic we chose equivalent scales or tests if possible, as described below.

Table 1

 Elgibility criteria, mild cognitice impairment inclusion criteria, and exclusion criteria for depression and vascular comorbidity

Modification of eligibility criteria

We did not apply the requirement in the piracetam study that subjects should score at least one point above the lowest possible score on seven of eight tests from a cognitive battery as we did not use this battery or an equivalent battery.

Modification of MCI inclusion criteria

Instead of a score of 0.5 on the clinical dementia rating scale, we used a score of 3 on the GDS or a score of 2 on the GDS together with a score of at least 0.5 on the first eight items of the BDRS as a measure of functional impairment. Instead of the New York University (NYU) paragraph test used in the Gal-Int-11 and InDDEx studies and the logical memory (LM) test of the Wechsler memory scale used in the ADCS-MIS, ampakine, and piracetam studies we used a Dutch story recall test which consists of a 20 item story.21 So that we could use a similar cut off for the Dutch story recall test, we estimated the centile score relative to healthy control subjects to which the cut off scores on the NYU test and LM test corresponded and used this score as the cut off on the Dutch story recall test. Centile scores of the NYU paragraph test were based on control subjects from the NYU (Kluger A, personal communication). The cut off score used in the Gal-Int-11 study corresponded to a score below the 53rd centile, and the cut off score used in the InDDEx study to a score below the 31st centile. As no normative data have been published for story A of the LM test used in the ADCS-MIS and ampakine studies, we multiplied the cut off by 2 and calculated centile scores based on published age norms for story A and story B together.22

In the ADCS-MIS trial, the centile scores were dependent on age and education and varied between the 38th and 62nd centile for subjects with more than 15 years of education, between the 4th and 24th centile for subjects with 8–15 years of education, and between the 1st and 10th centile for subjects with less than 8 years of education. In the ampakine study, the centile scores were also dependent on age and education and varied between the 76th and 90th centile for subjects with more than 15 years of education, between the 24th and 50th centile for subjects with 8–15 years of education, and between the 4th and 24th centile for subjects with less than 8 years of education. The cut off on the immediate recall of the LM test used in the piracetam study corresponded with a centile score between the 1st and 9th centile depending on the age of the subject.23 The cut off on the difference between the immediate recall and delayed recall of the LM test used in the piracetam trial corresponded approximately to a score below the 50th centile.23

Modification of exclusion criteria

Instead of a score >6 on the geriatric depression rating scale we used a score >12 on the HDRS as an exclusion criterion in the ampakine study. Instead of a score >17 on the HDRS-21 items scale used in the piracetam study, we used a score >17 on the HDRS-17 items scale. The exclusion criterion of moderately severe depression used in the Gal-Int-11 study was operationalised as a diagnosis of major depression according to DSM-IV criteria together with a score on the HDRS >17.

Missing data

Of the 150 subjects with known cognitive outcome, several had missing data for one or more tests or rating scales. The HIS was not available for 27 subjects. As these subjects did not display major vascular pathology on clinical examination none of these subjects was excluded. The Dutch story recall test was not administered to 12 subjects. Subjects with missing data for this test were not included in the eligible sample for MCI criteria that made use of this test. Compared with subjects in whom the Dutch story recall test was administered, those without this test scored higher on the HDRS (14.4 v 10.2). Age, sex, educational level, MMSE score, delayed recall score on the RAVLT, or prevalence of predementia Alzheimer’s disease did not differ. The number of subjects who were excluded from the eligible sample because of missing data for the Dutch story recall test was 12 for the Gal-Int-11 study and piracetam studies, 11 for the InDDEx study, nine for the ADCS-MIS study, and eight for the ampakine study. The BDRS was not scored in one subject and this individual was excluded from the eligible sample of the rofecoxib study.

Statistical analysis

For each MCI definition we first determined the sample that would be eligible for the study. This sample consisted of all subjects in the age range of the study who did not meet the criteria for dementia at baseline. Next we applied the inclusion criteria for MCI. Then we applied the exclusion criteria for depression or vascular comorbidity. Three sets of analyses were performed. First, we calculated the diagnostic accuracy of the MCI definition using both inclusion criteria and exclusion criteria in the subjects who were eligible for the study. Second, we calculated the diagnostic accuracy of the MCI definition using only inclusion criteria. Third, we calculated the diagnostic accuracy using both inclusion criteria and exclusion criteria in a young subgroup (age 50 or 55 to 64) and an old subgroup (age 65 or above). For each analysis we defined four groups: subjects eligible for the study without predementia Alzheimer’s disease who did not meet the criteria of the MCI definition (group A); subjects eligible for the study without predementia Alzheimer’s disease who met the criteria of the MCI definition (group B); subjects eligible for the study with predementia Alzheimer’s disease who did not meet the criteria of the MCI definition (group C); and subjects eligible for the study with predementia Alzheimer’s disease who met the criteria of the MCI definition (group D). Outcome measures were the odds ratio ((A*D)/(B*C)), sensitivity (D/(C+D)), specificity (A/(A+B)), positive predictive value (D/(B+D)), and negative predictive value (A/(A+C)). Data are presented as means and 95% confidence intervals (CI).

RESULTS

Diagnostic accuracy of MCI definitions

Table 2 shows the number of subjects who were eligible for the study, the number of eligible subjects who met the inclusion criteria for MCI, the number of subjects with MCI who were excluded because of depression or vascular comorbidity, and the number who met both the MCI inclusion and exclusion criteria. The baseline characteristics of subjects meeting inclusion and exclusion criteria is shown in table 3. The odds ratio, sensitivity, specificity, positive predictive value, and negative predictive for predementia Alzheimer’s disease of each of the MCI definitions is shown in table 4.

Table 2

 Number of subjects meeting eligibility criteria and inclusion and exclusion criteria for mild cognitive impairment

Table 3

 Baseline characteristics of subjects according to different definitions of mild cognitive impairment

Table 4

 Diagnostic accuracy of definitions of mild cognitive impairment for predementia Alzheimer’s disease

Effect of exclusion criteria for depression and vascular comorbidity on diagnostic accuracy

Table 4 shows that after application of the exclusion criteria the odds ratios of the the criteria in the Gal-Int-11, ADCS-MIS, and piracetam studies increased and those of the criteria in the InDDEx, ampakine, and rofecoxib studies decreased. The sensitivity and negative predictive value decreased and the specificity increased for all MCI criteria. The positive predictive value of all criteria except those in the rofecoxib study increased.

Effect of age on diagnostic accuracy

The effect of age on diagnostic accuracy of the criteria in the rofecoxib study was not tested as that study included only subjects older than 65 years. Table 5 shows the effect of age on the diagnostic accuracy. The positive predictive value was higher and the specificity and negative predictive value were lower in the older than in the younger subgroup for all criteria. For most criteria, the odds ratio was lower and the sensitivity higher in the older than in the younger subgroup.

Table 5

 Effect of age on diagnostic accuracy of mild cognitive impairment definitions for predementia Alzheimer’s disease

DISCUSSION

MCI criteria used in recent drug trials have a low to moderate diagnostic accuracy for predementia Alzheimer’s disease. Application of exclusion criteria for depression and vascular comorbidity led to an increase in the positive predictive value and specificity at the cost of the sensitivity. The diagnostic accuracy of the MCI criteria for predementia Alzheimer’s disease was dependent on age.

Diagnostic accuracy of MCI criteria for predementia Alzheimer’s disease

There was marked heterogeneity in diagnostic accuracy for predementia Alzheimer’s disease between the different MCI criteria. The highest odds ratios were seen for MCI criteria from the ampakine study (OR = 11) and ADCS-MIS study (OR = 8.8). Nevertheless, these odds ratios are below the observed minimum for a good diagnostic test (25). The odds ratios of the other MCI criteria were much lower (between 0.84 and 5.8). The variability in diagnostic accuracy may depend on the variability in the definition of cognitive impairment. Although all studies used the same definition of functional impairment (a score of 0.5 on the CDR), there were marked differences for the definition of impairment on memory tests. Memory impairment was defined as a score below the 53rd centile in the Gal-Int-11 study, below the 31st centile in the InDDEx study, below the 9th centile in the ADCS-MIS study (this is the average centile score given the age and education distribution in our sample), below the 31st centile in the ampakine study (this is also the average percentile score given the age and education distribution in our sample), below the 50th centile in the piracetam study, and below the 46th centile in the rofecoxib study (the centile score was estimated from data of healthy subjects older than 65 years from the Maastricht aging study).21 As can be seen in table 2, studies with a strict cut off score for memory impairment had a higher positive predictive value than those with a lenient cut off score. It is remarkable that the InDDEx and Gal-Int-11 studies used a permissive definition of memory impairment on the NYU paragraph test, as it has been shown that the best score on this test to identify subjects with predementia Alzheimer’s disease among those with MCI is a score below 6, which is well below the cut off scores used in InDDEx and Gal-Int-11 studies.24

Effect of exclusion criteria for depression and vascular comorbidity on diagnostic accuracy

The application of exclusion criteria for depression and vascular comorbidity led to an increase in the odds ratio for predementia Alzheimer’s disease (indicating an improvement in overall diagnostic accuracy) for the MCI criteria of the Gal-Int-11, ADCS-MIS, ampakine, and piracetam studies, and a decrease in odds ratio for the MCI criteria of the InDDEx and rofecoxib studies. The specificity and positive predictive value typically increased, while the sensitivity and negative predictive value decreased. These effects mainly reflected the exclusion criteria for depression because 88–100% of the excluded subjects were excluded because of depression. The small number of exclusions because of vascular comorbidity is probably because we had already excluded at baseline those subjects in whom the cognitive impairment was thought to be due to vascular lesions. The decrease in odds ratio of the MCI criteria of the InDDEx and rofecoxib studies was probably the result of the relatively large number of excluded subjects who had predementia Alzheimer’s disease. The effect of the exclusion criteria on the sensitivity correlated well with the severity of the threshold for depression: the fall in sensitivity was low for the MCI criteria of the Gal-Int-11 and piracetam studies which used an high threshold for depression and high for the MCI criteria of the InDDEx study which used a low threshold. The large effect of depression cut off on sensitivity is consistent with previous studies that showed a high prevalence of mild depressive disorders in subjects with predementia Alzheimer’s disease.14,25,26

Effect of age on diagnostic accuracy

The positive predictive value was higher in the older than in the younger age group, which is probably because the prevalence of predementia Alzheimer’s disease was higher in the older age group. The odds ratio was higher in the younger than in the older subgroup for the MCI criteria of the InDDEx, ADCS-MIS, ampakine, and piracetam studies. One possible explanation for this observation is that in the younger subgroup the exclusion criteria for depression more often excluded subjects without than with predementia Alzheimer’s disease, while in the older age group the exclusion criteria for depression more often excluded subjects with than without predementia Alzheimer’s disease.

Limitations

A limitation of the study is that we defined predementia Alzheimer’s disease as conversion to Alzheimer’s disease-type dementia after a five year follow up period, because MCI subjects may convert to Alzheimer’s disease-type dementia at longer follow up intervals. Thus we may have made a false negative diagnosis of predementia Alzheimer’s disease in some cases, which may have led to an underestimation of the positive predictive value and to an overestimation of the negative predictive value. However, the number of misclassified subjects is likely to be small as preliminary data from the 10 year follow up of a subset of patients indicate that only 10% of the non-demented subjects at the five year follow up converted to dementia during the next five years.

A second limitation is the relatively small sample size for several of the MCI definitions and age subgroups. This reduced statistical power and resulted in large 95% confidence intervals.

Another limitation is that we could not always use the same tests and rating scales that were used in the trials, which may limit the comparability of our results with that of the trials. Baseline characteristics of subjects from the ADCS-MIS and InDDEx trials suggests that nevertheless we have included a similar population, because subjects included in the ADCS-MIS trial had similar scores on the MMSE, GDS, delayed recall, and verbal fluency as subjects who met criteria of the ADCS-MIS study in our sample17; and subjects included in the InDDEx trial had similar scores on the MMSE and GDS as subjects who met the criteria of the InDDEx study in our sample (table 1).16 In addition, the finding that the positive predictive value (which is the same as the conversion rate) was low for subjects meeting the MCI criteria of the InDDEx and Rofecoxib studies is consistent with the fact that the follow up of these studies had to be extended because of a low conversion rate.1

Another difference between the present study and the trials is that we selected subjects from a clinical setting at the time of the baseline assessment, while in the trials subjects could also have been recruited at follow up assessments and from other settings (for example, they could have been recruited by advertisements). This difference in subject selection may have affected the diagnostic accuracy because of referral and follow up bias. Nevertheless, the way we selected subjects is representative of the setting in which drugs would be prescribed.

Finally, our memory clinic is located at a psychiatry outpatient clinic and the effect of the exclusion criteria for depression on diagnostic accuracy may be less in settings in which the prevalence of depression or the awareness of the diagnosis is lower.

Implications for interpretation of present and ongoing trials

The findings may have important implications for the interpretation of present and ongoing trials. The overall diagnostic accuracy was low to moderate, which means that many patients will be treated who do not have predementia Alzheimer’s disease, or that many patients with predementia Alzheimer’s disease will not be included. It seems useful to plan subanalyses in subjects older than 65 years as these are more likely to have predementia Alzheimer’s disease. For studies that used a permissive definition of memory impairment, it seems recommendable to plan analysis in a subsample of subjects with more severe memory impairment.

Implications for design of future trials

As the use of MCI criteria alone will not lead to an accurate selection of subjects with predementia Alzheimer’s disease, a major challenge is to find other ways to identify such individuals. An alternative approach would be to combine a number of markers of predementia Alzheimer’s disease such as age, the MMSE score, degree of functional impairment, impairment on neuropsychological tests, medial temporal lobe atrophy, and the apolipoprotein E-e4 allele.4 Another challenge is to exclude subjects with depression related cognitive impairment without excluding depressed subjects with predementia Alzheimer’s disease. One option would be to exclude only subjects with severe depression, as people with predementia Alzheimer’s disease typically have mild to moderate depression and score below 21 on the HDRS.14 Another possibility is to exclude only young depressed subjects, as people with depression related cognitive impairment are younger than those with predementia Alzheimer’s disease.14

NOTE ADDED IN PROOF

After submission of the final version we obtained normative data for story A of the logical memory (LM) test from the Wechsler memory scale from the Chicago health and aging project (Bennett D, personal communication). As these normative data may provide better estimates of the centile scores than those we used in the paper we have recalculated the diagnostic accuracy of the MCI definitions of the ADCS-MIS and ampakine studies using the new data. The cut off points for story A of the LM test used in the ADCS-MIS study now corresponded with an age and education corrected centile score between the 17th and 56th centile (on average, the 23th centile), and the cut off points used in the ampakine study with a centile score between the 39th and 63rd centile (on average, the 45th centile). This indicates that according to the new normative data the severity of the memory impairment was less than with the normative data used in the main text. Thirty two subjects now met the inclusion and exclusion criteria of the ADCS-MIS study and 44 met the inclusion and exclusion criteria of the ampakine study. The odds ratio of the ADCS-MIS MCI definition for predementia Alzheimer’s disease was 7.5, the sensitivity 0.58, the specificity 0.83, the positive predictive value 0.69, and the negative predictive value 0.75. The odds ratio of the ampakine MCI definition for predementia Alzheimer’s disease was 3.9, the sensitivity 0.70, the specificity 0.63, the positive predictive value 0.52, and the negative predictive value 0.78. The effect of the exclusion criteria on the diagnostic accuracy of the MCI definition of the ADCS-MIS study was similar to the effect described in table 4. For the MCI definition of the ampakine study, however, the odds ratio decreased after application of the exclusion criteria, from 6.8 to 3.9. The odds ratio of the ADCS-MIS MCI definition for predementia Alzheimer’s disease was 12 for subjects younger than 65 and 3.4 for those older than 65. The odds ratio of the ampakine MCI definition for predementia Alzheimer’s disease was 4.4 for subjects younger than 65 and 1.8 for subjects older than 65.

In conclusion, with the new normative data, the estimates of the specificity, positive predictive accuracy, and overall diagnostic accuracy were lower than the estimates based on the normative data that were used in the main text. This effect was largest for the MCI definition of the ampakine study. The use of the new normative data would not change the conclusions of our paper.

Acknowledgments

We would like to thank Paula Hemdal (UCB group) for providing additional information on the MCI definition used in the piracetam study and Cristopher Lines (Merck Research Laboratories) for providing additional information regarding the MCI definition used in the Merck rofecoxib MCI study.

Footnotes

  • Competing interests: PS is an associate editor of JNNP but had no role in the reviewing or acceptance of this paper.

REFERENCES

Podcasts
Visit the full archive of podcasts for JNNP here >>

Free sample
This recent issue is free to all users to allow everyone the opportunity to see the full scope and typical content of JNNP.
View free sample issue >>

Don't forget to sign up for content alerts so you keep up to date with all the articles as they are published.