Article Text

Download PDFPDF
Research paper
Measuring cognitive change in subjects with prodromal Alzheimer's disease
  1. T Mura1,2,3,4,
  2. C Proust-Lima5,6,
  3. H Jacqmin-Gadda5,6,
  4. T N Akbaraly1,2,7,
  5. J Touchon1,2,8,
  6. B Dubois9,
  7. C Berr1,2,8
  1. 1INSERM, U1061, Neuropsychiatrie: Recherche Epidémiologique et Clinique, Montpellier, Cedex, France
  2. 2Université Montpellier I, Montpellier, Cedex, France
  3. 3Département d'Information Médicale & Centre d'Investigation Clinique, CHRU Montpellier, Montpellier, France
  4. 4INSERM, CIC 1001, Montpellier, France
  5. 5INSERM U897, Equipe de Biostatistique, Centre de Recherche en Epidémiologie et Biostatistique, Bordeaux, France
  6. 6ISPED, Université Bordeaux Segalen,  Bordeaux, France
  7. 7Department of Epidemiology and Public Health, University College London, London, UK
  8. 8CMRR Languedoc Roussillon, service de Neurologie, CHRU Montpellier, Montpellier, France
  9. 9INSERM-UPMC UMRS 975, Institut de la Mémoire et de la Maladie d'Alzheimer, ICM, APHP, Salpétrière Hospital, University Paris 6, Paris, France
  1. Correspondence to Dr Thibault Mura, Inserm U1061, Hôpital La Colombière, Pavillon 42 39 av. Charles Flahault, 34493 Montpellier, Cedex 5, France; t-mura{at}


Objective To investigate the sensitivity of a large set of neuropsychological tests to detect cognitive changes due to prodromal Alzheimer's disease(AD); to compare their metrological properties in order to select a restricted number of these tests for the longitudinal follow-up of subjects with prodromal AD.

Participants 212 patients with mild cognitive impairment were tested at baseline by a standardised neuropsychological battery, which included: the Free and Cued Selective Reminding test (FCSRT), the Benton Visual Retention test, the Deno100, verbal fluency, a serial digit learning test, the double task of Baddeley, the Wechsler Adult Intelligence Scale (WAIS) similarities, the Trail-Making Test and the WAIS digit symbol test. Patients were monitored every 6 months for up to 3 years in order to identify those who converted to AD (retrospectively classified as prodromal AD). Statistical analyses were performed using a nonlinear multivariate mixed model involving a latent process. This model assumes that the psychometric tests are nonlinear transformations of a common latent cognitive process, and it captures the metrological properties of tests.

Results 57 patients converted to AD. The most sensitive tests in the detection of cognitive changes due to prodromal AD were the FCSRT, the semantic verbal fluency and the Deno100. Some tests exhibited a higher sensitivity to cognitive changes for subjects with high levels of cognition, such as the free recall, delayed free recall scores of the FCSRT and the semantic verbal fluency, whereas others showed a higher sensitivity at low levels of cognition, such as the total recall score of the FCSRT.

Conclusions Tests used for the follow-up of prodromal AD subjects should be chosen among those that actually decline in this stage of the disease and should be selected according to the subject’s initial scores.

  • Alzheimer's Disease
  • Cognitive Neuropsychology
  • Dementia
  • Statistics

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


To describe patients with cognitive difficulties without criteria diagnosis for dementia, Flicker et al1 coined the term Mild Cognitive Impairment (MCI) with the idea that this concept could help in the early prediction of the development of dementia. However, MCI defines a syndrome—whose definition has been revised and changed several times since 19912–4—and may be the consequence of different diseases with distinct aetiologies. Consequently, some authors have emphasised the need to clearly characterise the underlying disorder (ie, Alzheimer's disease (AD), vascular dementia, etc).5 ,6 This characterisation could facilitate predicting the clinical progression of the disease and promote the development of specific therapeutic approaches at an early stage of the disease.6

In this context, the measurement of cognitive changes has two main uses. First, this measure could help to identify subjects with prodromal AD (ie, symptomatic predementia phase of AD or ‘MCI due to AD’6). Indeed, a working group has recently highlighted the importance of measuring cognitive change for the diagnosis of ‘MCI due to AD’ (ie, prodromal AD) stating as a diagnostic criterion that it ‘provides evidence of longitudinal decline in cognition’.7 The second use is to enable the follow-up of patients with prodromal AD, either in a clinical care context or in intervention trials with cognitive change as a primary outcome to assess the effects of drugs at an early stage of AD.8–10

Yet, in all of these situations, there is currently no recommendation concerning the choice of neuropsychological tests to be used. In order to be informative for the identification or the follow-up of subjects with prodromal AD, a neuropsychological test must fulfill some requirements. First, selected tests have to be markers of the evolution of the disease, and thus, have to explore the cognitive domains that are affected in the course of the different stages of the disease. Second, the tests have to be able to measure cognitive changes in the range of cognitive levels of the targeted population.8 This capacity can be limited by the metrological properties of neuropsychological tests (ie, floor and ceiling effects, curvilinearity: sensitivity to detect cognitive changes varying according to the initial cognitive level). These points can be analysed using a recently published statistical model that allows the study of metrological properties by modelling a latent cognitive factor, which is the common cognitive part measured by the neuropsychological tests.11

This work aims to investigate the sensitivity of a large set of neuropsychological tests frequently used in subjects with MCI, to measure cognitive changes due to prodromal AD; and to describe and compare the metrological properties of these tests. This comparison will aid in selecting a restricted number of psychometric tests for the longitudinal identification and follow-up of subjects with prodromal AD.


Study population

Data came from the PRE-AL Study (PREdiction of AD).12 Briefly, in 2001 and 2002, 251 patients with MCI2 were recruited and followed up semiannually over a period of 3 years. Subjects were recruited from memory clinics of 14 expert centres in the field of dementia across France (see Acknowledgments). Patients were enrolled according to the following criteria: (1) a subjective memory complaint; (2) an objective memory impairment documented by at least one word missing in the three-word recall of the Mini-Mental State Examination13 (MMSE) or a score under 29 on the Isaacs-set test,14 or both; (3) a preservation of general cognitive functioning (MMSE between 25 and 29/30); (4) a normal score or only one item impaired at the first level in the four Instrumental Activities of Daily Living (IADL)15; and (5) the absence of the Diagnostic and Statistical Manual of Mental Disorders, 3rd edition, revised (DSM-III-R) criteria for dementia.16 Patients with focal lesions in cerebral imaging or documented depressive symptoms or with other medical conditions, which could interfere with memory performance or follow-up, were excluded. Each subject signed an informed consent form once the nature of the procedures had been fully explained. The study was approved by the Ethics Committee of the ‘La Salpêtrière’ Hospital.

Out of the 251 participants included in the PREAL study, patients who had had no follow-up (n=17), those with only one visit at 6 months (n=13), and patients with no measurements for any of the neuropsychological tests during their follow-up (n=3), were excluded from our analyses. As AD was the primary outcome of the study, patients who converted to a non-AD dementia (n=6) were excluded from further statistical analyses. Finally, the present analyses were carried out on a total of 212 participants.

Data collection

Dementia diagnosis and definition of prodromal-AD

Patients were seen every 6 months for 3 years. At each visit, clinical assessment included the recording of all medical events, current treatment and complete neurologic examination. Activities of daily life were rated with the IADL scale during an interview with the patient and a knowledgeable collateral source (a spouse or a child).15 Memory complaint was assessed by a specific questionnaire.17 The Clinical Dementia Rating (CDR) scale was completed at each visit during follow-up.18 During the follow-up, when patient meets the clinical DSM-III-R16 criteria for dementia and/or the score of 1 at the CDR scale, he/she was considered as ‘converted’ to dementia. In order to minimise intercentre variability, all clinical records were reviewed by an Expert Committee composed of neurologists (n=3), neuropsychologists (n=3), geriatricians (n=3) and psychiatrists (n=3). They re-examined whether clinical criteria for dementia were satisfied using DSM-III-R criteria16 and classified the patient as AD using the NINCDS-ADRDA criteria.19 Patients who converted to AD during the 3-year follow-up were retrospectively classified as patients with prodromal AD; other subjects were designated as MCI non-AD.

Assessment of neuropsychological performance

All subjects were tested at inclusion and annually using a standardised neuropsychological battery. In the event of a suspected conversion, the patient underwent an additional neuropsychological evaluation 6 months later. Six cognitive domains commonly affected by ageing and AD were assessed using 13 scores20–27 described in table 1. The mean duration of the administration of the complete predictive battery was 91 min (SD=15).

Table 1

Assessment of neuropsychological performances


We used a nonlinear mixed model for multivariate longitudinal data involving a latent process (illustrated in figure 1) to analyse the cognitive trajectory over time of the MCI subjects during the follow-up. The statistical model assumes that the correlation between the neuropsychological tests is induced by a latent cognitive process (LCP) representing the common cognitive part measured by the neuropsychological tests. The model is divided into two parts: (a) a linear mixed model describes the change over time in the LCP and evaluates the common effects of covariates on this latent cognitive trajectory, and (b) test-specific nonlinear measurement models relate each administration of the neuropsychological tests with the LCP by describing and accounting for the metrological properties of the tests, and evaluating test-specific associations with covariates (here specifically prodromal AD status). Therefore the impact of a given covariate can be explored on the LCP and on each test.

  1. Effect of the covariates on latent cognitive change with time The LCP trajectory was modelled using a linear mixed model,28 which evaluates changes of a repeated outcome over time (ie, the LCP) and accounts for correlations between the repeated measures of each subject. The linear mixed model included a subject-specific random intercept and a slope for time after inclusion (in years of follow-up), as well as the following covariates: age, sex, educational level, prodromal AD status and their interaction with the time elapsed since inclusion. To define the LCP dimension, we assumed it followed a Gaussian distribution N(0.1) at baseline for the reference state of the linear mixed regression model (ie, women, with a low baseline level, MCI non-AD of 55 years old).

  2. Metrological properties and test-specific effects The test-specific measurement models consisted in flexible transformations linking the neuropsychological tests with the LCP. These transformations, which capture the metrological properties of the tests,29 are covariate-and-time-invariant parametric functions depending on parameters that are estimated simultaneously with the other parameters of the model. β cumulative distribution functions were chosen as flexible and parsimonious transformations. Indeed, these functions offered a large variety of shapes (concave, convex, sigmoid or simply linear) and thus modelled the curvilinearity of the tests. The complete methodology was developed elsewhere.11 ,30 This part of the model also evaluated the test-specific effects of covariates on each test after adjustment for their effect on the LCP.

  3. Mean annual change for each neuropsychological test according to the occurrence of AD during the follow-up (in LCP units) This was calculated by adding together (a) the common effect of time on the LCP, (b) the test-specific effect of time on each test and the test-specific effect of the interaction between time and prodromal AD status on each test.

Statistical tests were performed at the conventional two-tailed α level of 0.05. Data were analysed using SAS Enterprise Guide V.4.3 (SAS Institute, Cary, North Carolina, USA) and a FORTRAN90 executable for the nonlinear mixed model with the latent process (program HETMIXSURV available at

Figure 1

Conceptualisation of the nonlinear mixed model involving a latent process to model cognition from several neuropsychological tests.


Out of the 212 MCI participants included in the present analyses, 57 converted to AD during the 3-year follow-up (figure 2) and were retrospectively classified as patients with prodromal AD. The characteristics of these 212 participants are detailed in table 2.

Table 2

Baseline characteristics of the 212 mild cognitive impairment  patients included in the analysis

Figure 2

Diagram mapping the administration of the neuropsychological tests and the occurrence of  Alzheimer's disease during the 3-year follow-up of the study.

Sensitivity of the tests to detect cognitive changes due to prodromal AD

As shown in table 3, older age was associated with both a lower latent cognitive level at baseline (β=−0.081 units of LCP at baseline per year of age p<0.01) and a steeper latent cognitive decline (steeper decline of β=−0.010 units of LCP per years of follow-up per year of age p<0.01). A higher educational level was associated with a higher latent cognitive level at baseline, but not with a slower latent cognitive decline. Gender was neither associated with latent cognitive level at baseline nor with a steeper latent cognitive decline. As expected, cognitive change over time was different between prodromal AD and MCI non-AD (steeper decline of β=−0.615 units of LCP per year of follow-up in prodromal-AD, p<0.01). Possibly due to a practice effect, MCI non-AD patients showed an improvement in their scores during the study (β=0.25 units of LCP per years of follow-up, p<0.01), whereas the score of prodromal AD patients declined during the follow-up (β =−0.35 units of LCP per year, p<0.01).

Table 3

Parameter estimates of the linear mixed model for the latent cognitive process: effect of age, sex, educational status and prodromal AD status and their interaction with time during the follow-up

Figure 3 presents the mean test-specific annual change in both groups (for the reference state of regression model (ie, women, with a low baseline level, MCI non-AD) at the age of 71.8 years old). In subjects with prodromal AD, the three Free and Cued Selective Recall Reminding test (FCSRT) scores, the Deno100, the verbal fluencies and the Serial Digit Ordering Test (SDOT) significantly decreased over time. For MCI non-AD patients, none of the scores significantly decreased during the follow-up, and four tests showed a significant improvement over time (FCSRT free recall and delayed free recall, Benton Visual Retention Test (BVRT), Deno100 and Wechsler Adult Intelligence Scale-Digit Substitution Test (WAIS-DST)).

Finally, when comparing the annual variations of tests between patients with prodromal AD and MCI non-AD (figure 3), change over time in all of the tests except three (Baddeley, Wais-Similarities and Trail-Making Test, Part B (TMT-B)) differed between the two groups. Tests displaying the greatest differences between groups were the three FCSRT scores, the Deno100 and the verbal fluencies.

Figure 3

Mean annual change for each neuropsychological test according to the occurrence of Alzheimer's disease during the follow-up (in latent cognitive process units).

Comparison of the metrological properties of the neuropsychological tests

The estimated transformations between each test and the common LCP are shown in figure 4. Except for the verbal fluency (category) and the WAIS-DST, all the tests showed a nonlinear transformation. This indicates that a given loss of points for a given cognitive test does not correspond to the same intensity of decline of the LCP over the whole range of the test. Different curve shapes could be identified between the transformations: (1) concave for the FCSRT (total recall), the Deno100, the SDOT and the WAIS similarities; (2) convex for verbal fluency (letter-R) and the TMT-A and TMT-B; (3) sigmoid for the other FCRST scores (free recall and delayed free recall) and the BVRT; (4) close to horizontal for the Baddeley; (5) almost linear for the verbal fluency (category) and the WAIS-DST.

Figure 4

Metrological properties of the 13 neuropsychological scores used in the study.

Each of these shapes highlighted a different sensitivity pattern for cognitive change over time. Tests with a concave shape are more sensitive to cognitive change at low rather than at high levels of cognition. For example, a change of 1.0 unit of LCP in the medium-high cognitive range (0.0; 5.0) represents a loss of 2.8 points for the SDOT, whereas the same change in the medium-low range (−5.0; 0.0) represents a loss of 7.4 points. Similarly, tests with a convex shape are more sensitive to cognitive change at high rather than at low cognitive levels. Tests with a sigmoid shape are more sensitive to cognitive change in the medium range of the LCP. The horizontal shape of the Baddeley-Mü indicates that this test does not vary with changes in the LCP, and therefore, does not reflect any cognitive change in this population. A linear shape indicates that a loss of points in the scoring scale used in the test corresponds to the same intensity of decline of the LCP over the whole range of the test.

Exploring the low levels of cognition and the floor effect

Despite their high sensitivity to cognitive change at low levels of cognition, the four tests with a concave shape do not present the same ability to detect changes at low levels of cognition. While the FCSRT (total recall score) can explore low and very low levels of cognition, the shape of the SDOT is quite vertical below −5 units of LCP, revealing the inability of the SDOT to discriminate major cognitive impairment. Among other tests, the two TMTs showed horizontal shapes in the lowest range of the LCP. This illustrates the floor effect of these tests: minimum values of the tests are reached for a medium LCP value.

Exploring the high levels of cognition and the ceiling effect

The horizontal shape of total recall (FCSRT) in high cognitive levels underlines the ceiling effect of this test and suggests that it is not appropriate to assess cognitive changes in subjects with medium and high cognitive levels. Among the other tests, only the free recall (FCSRT) and the verbal fluency (category) cover the highest values of the LCP (+5.0–+10.0) and can actually discriminate changes in these high cognitive levels.


In this clinical prospective study of 212 MCI individuals, we provide rational explanations for selecting neuropsychological tests for the longitudinal identification and follow-up of subjects with prodromal AD. These explanations are based on the analysis of the sensitivity of neuropsychological tests to detect cognitive changes due to prodromal AD and on the description of the metrological properties of these tests (ie, variable sensitivity to cognitive change, floor and ceiling effect).

A major result of this study is the identification of tests that are sensitive to the cognitive change due to prodromal AD. Some tests present a poor sensitivity to cognitive change and seem, therefore, of limited interest in the context of research or clinical care for a longitudinal follow-up, namely, Baddeley's double task, the TMT-B and the WAIS similarities. Conversely, among the 13 scores analysed, six showed a significant decrease over time in the prodromal AD patients, and a substantial difference in their evolution compared with MCI non-AD subjects: the 3 FCSRT scores, the semantic verbal fluency, the Deno100 and the SDOT. These results are consistent with the neuroanatomic distribution of histopathological abnormalities reported in the mildest stages of AD, which overlaps, at least partially, with the regions implicated in these tests. For example, the neuropathological changes in the early stage of AD begin primarily in medial temporal regions (hippocampus, entorhinal cortex),31 which are known to be critical for episodic memory function. Consequently, an impaired ability to learn and retain new information (ie, an episodic memory deficit) is usually the earliest and most prominent feature of AD.32 The decrease over time of both free recall and total recall scores (FCSRT) is therefore consistent with the aggravation of the amnesic syndrome of the medial temporal type.12 Among other tests, the observed decline of the verbal fluency (category) test and the Deno100 illustrates the early impairment of access to semantic memory due to the temporal neocortical damage that occurs in AD.33 The greater impairment in semantic fluency rather than in letter fluency has been previously reported in Mild AD.34 ,35 A potential limitation of this study is that it included subjects with MCI defined according criteria dating from 1999.2 The use of other criteria to define MCI or the carrying out of a similar study in the general population could have led to different results. Another point is that subjects were followed over a limited period of 3 years, and it is possible that some AD occurred after the third year of follow-up. However, considering the decreasing incidence of AD during the follow-up, with only five cases diagnosed during the last visit (figure 2), it is likely that this number is low and that this information bias is minimal.

With regards to the metrological properties of the tests, our results suggest a potential interest of selecting neuropsychological tests for a longitudinal follow-up according to the initial cognitive level of the target population (in a research setting) or of the patient (in a clinical setting). Some tests should be chosen for evaluating cognitive changes at high levels of cognition (FCSRT free recall score/delayed free recall score and verbal fluency ‘category’), whereas others should be avoided because of their ceiling effect (FCSRT total recall, Deno100). Conversely, some tests seem more useful for evaluating cognitive changes at low levels of cognition (FCSRT total recall score), while some should be avoided (WAIS-DST, TMT-A, SDOT). Another result of this study is that a majority of the scores exhibited strong curvilinearity. This curvilinearity could have consequences on the results of intervention trials or in epidemiological research,36 and thus, has to be taken into account in the analysis. In addition, this curvilinearity partly explains the remaining difficulties in providing a cut-off value for cognitive decline in the diagnosis of prodromal AD. Indeed, for most of the tests, a decrease of a given number of points will not have the same meaning according to the initial value of the score. Thus it could be interesting to create an algorithm that would include different scores and/or different cut-off values to provide a prediction of the probability of developing AD rather than use a single test with a single cut-off value.

The improvement observed for some tests in MCI non-AD patients (FCRST, BVRT, Deno100) is probably due to practice and/or learning effects.37 This kind of effect has been extensively documented in healthy participants for tasks assessing different cognitive functions including verbal episodic memory.38 Contrary to some studies,37 ,39 the observed improvement was not limited to the second testing but was observed during the entire study (data not shown). This effect is generally considered as an interfering variable complicating the interpretation of results.40 However, it could also be considered as an interesting property that could help to differentiate the normal ability of an individual to learn and adjust with practice from the pathological process of an individual with prodromal AD.

Our statistical methodology has several advantages. It can handle unbalanced repeated measurements and bounded quantitative outcomes, and it can take into account and describe the metrological properties of neuropsychological tests.11 However, it is worth noting that the LCP in this model is necessarily defined according to the pool of psychometric tests used in the analysis. In this study, we used a large battery of neuropsychological tests that assessed several cognitive domains. Therefore, the modelled LCP could reasonably be interpreted as a global cognitive factor. However, a limitation of this study is that it was not designed to compare numerous different measures of a specific cognitive domain (like episodic memory), and hence, cannot reasonably highlight one measure over others that were not tested.

Our study provides rational explanations for the selection of neuropsychological tests for the cognitive follow-up of patients with MCI in a clinical care context (for diagnostic or prognostic purposes of prodromal AD) or research context (to identify a target population or as an outcome for the effect of an early intervention in prodromal AD). The tests that can be recommended are those that actually demonstrate a decline at the stage of prodromal AD (the three FCSRT scores, the semantic verbal-fluency, the SDOT and the Deno100), and which are able to measure cognitive changes in the range of cognitive levels of the targeted population. In the current study, the free recall, the delayed free recall score (FCSRT) and the semantic verbal fluency test cover a wide cognitive range and seem to be adapted for the follow-up of subjects with an initial medium or high level of cognition. Conversely, the total recall score (FCSRT) suffered from a very considerable ceiling effect, but appeared to be the best score for following up subjects with an initial low cognitive level. For future research, comparing several neuropsychological tests (or score) within a specific cognitive domain (particularly verbal episodic memory and language) using the same methodology could be of great interest for refining the choice of cognitive tests to be used.


The authors thank all of the investigators of the PreAl study – Serge Belliard (Rennes), Didier Hannequin (Rouen), Marie Pierre Hervy (Kremlin Bicetre), Bernard Laurent (Saint-Etienne), Sylvie Legrain (Paris), Bernard Michel (Marseille), Florence Pasquier (Lille), Michelle Puel (Toulouse), Anne Sophie Rigaud (Paris), Philippe Robert (Nice), Martine Vercelletto (Nantes), Marc Verny (Paris) – for clinical evaluations and their help during this study. Thanks to Joana Norton for proofreading.



  • Contributors All authors have made substantial contribution to this work: TM performed statistical analyses and interpretation of the data and wrote the first draft of the manuscript. JT, BD and CB were involved in data acquisition, and revised the article for important intellectual content. CP-L, HJ-G and TNA have made substantial contributions to the conception of the article, helped to interpret the statistical analysis, and revised the article for important intellectual content. All authors read and approved the final version of the article. This article contains original data. TM had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

  • Funding This work was funded by the ‘French Ministry of Health’ (grant PHRC national 2000). TNA is funded by the ‘Chercheur d’Avenirgrant from the Languedoc-Roussillon region (France).

  • Competing interests None.

  • Ethics approval The study was approved by the Ethics Committee of the ‘La Salpêtrière’ Hospital.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Linked Articles