Article Text

Download PDFPDF

Neuropsychological prediction of conversion to dementia from questionable dementia: statistically significant but not yet clinically useful
  1. J Tian1,
  2. R S Bucks2,
  3. J Haworth3,
  4. G Wilcock3
  1. 1Department of Care of the Elderly, Beijing University of Chinese Medicine, Dongzhimen Hospital, Beijing, China
  2. 2Department of Psychology, University of Southampton, Southampton, UK
  3. 3Department of Care of the Elderly, University of Bristol, Frenchay Hospital, Bristol, UK
  1. Correspondence to:
 Professor G Wilcock, Department of Care of the Elderly, Frenchay Hospital, Bristol BS16 1LE, UK; 


Background: Verbal memory impairment, one of the earliest signs of Alzheimer’s disease (AD), may help identify people with cognitive impairment, insufficient for a diagnosis of dementia (questionable dementia: QD), at risk of developing AD. Other cognitive parameters have been found that may indicate which people with QD will go on to develop dementia. Nevertheless, some researchers have reported only partial success in differentiating between mild AD and age related cognitive impairment.

Objectives: To discover if there are early, pre-clinical cognitive markers that could help identify patients attending our memory clinic who were at risk of developing dementia.

Methods: Multidisciplinary assessment of a consecutive sample of 195 patients with QD seen in a National Health Service hospital outpatient clinic; 135 seen for a mean follow up of 24.5 months.

Results: Conversion rate to dementia was 27.4% (37 of 135). A diagnosis of probable or possible AD was made in 15.6% (21 of 135) of cases. Despite statistically significant differences in some cognitive tasks between those who did and those who did not go on to dement, Cox regression analyses failed to improve prediction rates markedly above base rates and were unstable.

Conclusion: A large number of studies claim good prediction of conversion to dementia using cognitive test scores. Although this study produced similarly good sensitivity and specificity values, proper consideration of the statistical analyses and their clinical significance suggested that these prediction methods are currently too imprecise for clinical use. Use of cognitive indicators combined with neuroradiological, neuropathological, and genetic factors for predicting conversion to dementia might prove more reliable but may be beyond the scope of many geriatric services.

  • prediction
  • dementia
  • neuropsychological tests
  • AD, Alzheimer’s disease
  • QD, questionable dementia

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

The potential to identify people at risk of developing dementia, particularly Alzheimer’s disease (AD), has important implications for diagnosis and treatment of early disease,1 although whether mild cognitive impairment represents early stage AD remains controversial.2–5 There is considerable evidence to support the concept of a preclinical phase in AD.4,6–8 Community studies have reported between 17% and 30% of people over the age of 65 years have cognitive impairment without meeting diagnostic criteria for dementia,9,10 many of whom meet diagnostic criteria for dementia after follow up,2 with the proportion converting increasing with increasing length of follow up.11,12 It has also been suggested that older people convert from a diagnosis of questionable dementia (QD) to dementia more rapidly than those who are younger.7 Finally, both the patients who subsequently dement and their informants seem to be aware of declines in function up to five years before diagnosis.13 The difficulty, however, is in determining which of these people is in the early stages of a dementing disorder and which is suffering non-progressive cognitive impairment related to some other cause.

A number of longitudinal studies has attempted to predict who will convert to dementia using neuropsychological parameters (see Collie and Maruff5 for a review). Verbal and visuospatial memory impairment have been reported both as one of the earliest signs14–17 and are associated with increased risk of developing dementia.8,18–22 Other cognitive parameters include tests of naming, abstraction, and verbal fluency,2,7,19,20,23–26 and visuospatial and executive functioning.26 Less consistent is the relation between measures of attention or strategic control and final diagnosis, with some studies finding significant differences19,25,27,28 and others not.29 In contrast, other researchers have reported little or no success in predicting the development of dementia13 or in differentiating between mild AD and age related cognitive impairment.30,31

A number of published studies have attempted to draw distinctions between people with QD who will and will not dement. These can be divided into those that have recruited healthy community living participants7,8,18,19,21,25–28,32–34 and those that have recruited people with minimal cognitive impairment.2,5,17,35–38 Percentage correct prediction values for these studies vary greatly: sensitivity 50.0%–91.4%, specificity 5.9%–99.0% for community studies; sensitivity 74.0%–83.3%, specificity 76.9%–94.0% for minimal cognitive impairment studies. Criteria used to diagnose people in the minimal cognitive impairment studies are also diverse. Some have included all people who were not normal but not yet demented (QD, for example, Devanand et al2), others recruited people with age associated memory impairment (AAMI39; for example, Hanninen et al17). Thus, differences in the people recruited for these studies have led to large differences in rates for conversion to dementia (9.1% to 41.9%) see also Ritchie and Touchon40 and Collie and Maruff.41

In addition, there is the issue of censored cases; that is people who are lost to the study before a diagnosis is made. This leads to a risk of bias if any of the explanatory variables are, by chance, correlated with time to follow up or diagnosis. Cox regression (survival analysis) is a method for modelling time to event data that can account for censored cases. Only three studies have used Cox regression, and all of these were community studies.8,18,21

To be of practical utility, we need a method of accurately predicting outcome for the types of patients attending an outpatient assessment clinic. That is, people who are neither quite normal, nor clearly demented (that is, QD). Thus, this study aimed to determine whether the predictive results offered by Devanand and colleagues were replicable in a UK sample using different neuropsychological tests. As such, it was a test of the general principle of predicting subsequent conversion to dementia from neuropsychological measures. However, we made two key changes to the method in that we used Cox regression and varied the data type and models used to determine how stable the predictive solutions might be.



Participants were 568 consecutive new patients attending the Bristol Memory Disorders Clinic between 7 June 1993 to 28 August 1997. During this period 195 (34.3%) had QD, 370 (65.0%) were diagnosed with a dementia, and three (0.5%) were found to have no cognitive impairment at all. Of the 195 QD patients, 135 (69.2%) were seen for at least one follow up visit.

Patients were referred to the Bristol Memory Disorders Clinic (BMDC) by their GP or other healthcare professional. Each patient underwent medical, psychiatric, and psychological screening to exclude any other treatable disease. Particular attention was paid to presenting symptoms, onset (sudden or insidious), progression (static, stepwise, or gradual), and presence of memory and other cognitive problems, as well as affective or behavioural difficulties. Past medical history was also evaluated, emphasising conditions that might be associated with cognitive impairment, medications, and substance misuse. Family history of depression and organic or neurological disease was also noted. A depression rating scale was used (Cornell Scale for Depression in Dementia).42 Patients were referred to a psychiatrist if there was any clinical suspicion of affective or psychotic illness or if they scored above the cut off on the depression rating scale.

Behavioural and functional deficits were measured in interview with a knowledgeable collateral source (generally a spouse or adult child) using the Stockton Geriatric Rating Scale.43 The standardised Hachinski Ischaemic Scale44,45 was administered. A comprehensive physical examination was undertaken including neurological examination. Laboratory blood testing, CT brain scans, and where clinically indicated single photon emission computed tomography or magnetic resonance imaging were also carried out.

The neuropsychological assessment used in the BMDC was designed and validated specifically for the clinic.46 The assessment included measures of higher order cognitive function assessing short-term and working memory (Digit Span, forwards and backwards), abstract thinking (Similarities), and general non-verbal problem solving (Picture Completion) all Wechsler Adult Intelligence Scale-Revised (WAIS-R47), measures of episodic memory assessing immediate and delayed story recall (Adult Memory and Information Processing Battery48), list recall, verbal learning, and verbal recognition (Hopkins Verbal Learning Test: HVLT49), and visual recognition (Middlesex Elderly Assessment of Mental State: MEAMS50). Other tests assessed language expression, reception, and executive function using the Frenchay Aphasia Screening Test,51 letter fluency,52 and Weigl’s Colour Form Sorting Test53 and measures of visuospatial function (Cube analysis, Visual Object Space Perception Battery54) and psychomotor speed (Digit copying55).

Each patient’s assessments were discussed in a multi-disciplinary case conference by trained clinicians. The diagnosis was based on a history of cognitive impairment relative to the patient’s premorbid abilities. Impairment in any cognitive function was defined as performance 1.5 or more SD below expected level of functioning using age adjusted means. For AD, this history was of gradual onset and progressive course. Diagnosis of dementia was made according to the Diagnostic and Statistical Manual of Mental Disorders-Revised IV Edition.56 A diagnosis of probable AD was made according to the NINCDS-ADRDA criteria.57 A diagnosis of QD was made if a patient was (a) experiencing memory impairment that affected their social or occupational functioning but without other cognitive impairment, or (b) experiencing cognitive impairment in one or more areas of functioning but this impairment did not result in significant change in social or occupational functioning. Thus, a diagnosis of QD indicated that there was objective evidence of cognitive impairment but that these impairments did not satisfy the criteria for dementia. The outcome measure was the final diagnosis at follow up.

One hundred and thirty five participants were followed up for 4.8 to 95.7 months (mean 24.5, SD 19.7, range 1–7 follow up visits, mode 1, median 2). At final follow up their diagnoses were reviewed, at which time 37 (27.4%) had gone on to develop dementia and 92 (68.1%) continued to suffer QD, additionally, six (4.4%) had recovered from all cognitive deficits and were deemed to be normal. Table 1 gives the descriptive characteristics of those who converted to dementia and those who continued to suffer from QD.

Table 1

Descriptive statistics

Statistical analyses

Univariate Mann-Whitney U and χ2 statistics were used to compare groups. Prediction of outcome was evaluated using Cox regression proportional hazards analysis. Either variables were used as continuous covariates, or the neuropsychological variables were recoded categorically according to whether patients scored above or below the 5th centile cut off for age adjusted means. With the exception of the HVLT learning index these means were taken from published normative data. The Learning Index was based on a similar index developed by Bucks et al.58 (See footnote*). All analyses were performed with SPSS for Windows version 11.059 using an α level of 0.05.


There were no significant differences between the groups in premorbid IQ, years of education, dependency, ischaemic score, or depression, nor was there a significant sex difference. Those who developed dementia reported a significantly shorter mean history of cognitive impairment, had significantly lower mean MMSE scores, and were significantly older than those who did not develop dementia (see table 2). Table 2 shows the distribution of diagnoses. In those who went on to develop dementia, a diagnosis of probable or possible AD was made in over half the cases. Of those who continued with QD a specific aetiology was identified in less than half.

Table 2

Causes of dementia and questionable dementia

Table 3 shows mean neuropsychological test performance at the initial visit. There were significant group differences in Similarities, all the Memory measures and Verbal Fluency.

Table 3

Performance of groups on cognitive tests at the initial evaluation by final diagnosis

Cox regression analyses were performed with progression to dementia as the event, and time to diagnosis or survival as the time variable. Only those variables on which the two groups performed significantly differently in univariate comparisons were entered into the equations. Because of small amounts of missing data, only those cases for whom there was complete data were entered into the analyses (QD n=80, D n=30). Three variants of the Cox regression analysis were conducted (Enter, Forwards Stepwise, and Backwards Stepwise) for both continuous and categorical predictors. This permitted comparison with the same sorts of models found in the literature as well as an exploration of the stability of the analyses. Table 4 shows the prediction results for each of the different analyses performed.

Table 4

Predictive validity and proportion variance explained by the Cox regression models, showing odds ratios and 95% confidence intervals for significant predictors

A number of the models compared favourably with published findings. However, using the neuropsychological test scores as continuous predictor variables produces problems because of colinearity of the predictor measures: that is, the tendency for test results to vary systematically with each other as a function of disease severity. This can be seen in the degree to which Models 1 to 3 differed in the variables that were shown to contribute significantly to the likelihood of converting to dementia. In Model 1, all variables were forced into the equation together. Only two were found to be potentially significant explanatory variables: Verbal Recognition and Verbal Fluency. In Model 2, a forwards stepwise procedure resulted in two different variables, HVLT Total List Recall and MMSE being left in the equation. In Model 3, using continuous covariates, a backwards stepwise procedure resulted in four variables being left in the equation, two of which were the same as the significant explanatory variables in Model 1, and none of which was the same as in Model 2. These were Age, Visual Recognition, Verbal Recognition, and Verbal Fluency. In the models that used categorical covariates there was consistency in the variables that were identified as significant predictors (Models 4–6) methods. In all three the significant covariates were Age and HVLT Total List Recall.


As in previous studies,8,14–17,24,35,36 a number of neuropsychological parameters seem to offer potential to distinguish between those who will and those who will not go on to dement. However, despite significant group differences in these measures, many did not prove to be consistently significant predictors of conversion to dementia. For example, despite a significant difference in mean MMSE score, this was not consistently a significant predictor of final diagnosis of dementia, indeed it was only a significant predictor in one model. This finding reflects the generally poor status of the MMSE as a significant predictor noted in other studies.1,2,38,60

As has previously been found, there were significant differences between the QD and D groups on univariate analysis of all memory tests.7,16,18,23,30,60 However, the predictive power of memory test performance was less convincing. Recall performance, whether Immediate or Delayed, was not found to be significantly predictive of final diagnosis, although, Verbal Recognition was found to be a significant predictor in Models 1 and 2. Other neuropsychological test variables that appeared as significant predictors in some models were Verbal Fluency (Models 1 and 3) and Visual Recognition (Model 3).

In all analyses using categorical scores (Models 4–6) only low Total List Recall performance was significantly associated with increased risk of converting to dementia. Otherwise, only increasing Age was consistently found to be moderately positively associated with increased risk of converting to dementia: a finding consistent with a number of other studies.18,38,61

By comparison with the Devanand results,2 a smaller proportion of our sample converted to dementia (27.4% compared with 41.3%). Although this may relate to the slightly shorter mean interval of follow up (25 months compared with 30 months) between the studies, there was no significant difference between our groups in the length of follow up to diagnosis or survival making this explanation unlikely. There was also a smaller proportion of subjects with diagnoses of probable or possible AD; in our sample 56.6%, in Devanand et al 87.1%.

Despite this difference in base rates of conversion, in terms of predictive accuracy, these results seem to replicate Devanand et al. Indeed, the sensitivity, specificity, positive and negative predictive values appeared quite good. Sensitivity values ranged from 53.4% to 64.4%. Specificity values ranged from 66.7% to 83.3%. Positive predictive values (PPV) were equally good, although the best was associated with only modest sensitivity (Model 3) and should, therefore, be discounted. Good sensitivity is often gained at the expense of poor specificity, with the same trade off occurring for PPV and negative predictive values (NPV). For example, the best PPV (Model 1: 87.0%) was associated with poor NPV (46.9%). Thus, for every two people who were predicted to continue with QD, this would be incorrect for one of them. In addition, only two thirds of the people who actually did convert to a diagnosis of dementia were correctly identified (sensitivity 64.4%) and one in four of those who did not were misidentified (specificity 76.7%). Thus, a technique that at first sight looks promising, is really rather less than satisfying.

Review of the literature reveals a number of explanations for this general difficulty with predictive studies. These include issues of test sensitivity,1,6 as well as the problem that many of the tests are highly intercorrelated. This colinearity between independent variables is one of the primary reasons for poor regression models. Unfortunately, few studies confront this issue. For example, despite reporting a failure to find significant associations attributable to colinearity in their predictor variables Devanand et al2 went on to conduct a discriminant function analysis. Discriminant function uses another type of regression model that is no less susceptible to the effects of colinearity than their original analysis. Many other studies simply force all variables into their regression models (Enter method) rather than allowing them to be placed into or out of the final equation according to a statistical criterion as in a stepwise model. With highly intercorrelated data, in a stepwise model what ends up in the equation will therefore depend on the order with which the variables were put into or taken out of the equation. This is demonstrated clearly by the instability shown between Models 1 to 3.

Some researchers have sought to resolve this correlation problem by conducting principal components analysis (PCA) to reduce their dataset to a smaller number of uncorrelated components.24,25,28,62 Fabrigoule and colleagues24,25,28 isolated a component that they interpreted as reflecting a disturbance of control processes. Indeed, a number of researchers have argued that studies should be focusing on attentional control processes in their search for a specific marker of incipient dementia.24,25,28,62 Findings regarding the extremely specific nature of the central executive deficits found in mild AD patients, particularly their marked difficulty with dual task performance63,64 would tend to support this view. However, reported inconsistency regarding the components identified by PCA,28,62 and the difficulty of applying PCA analysis to routine clinical data make such approaches currently unsuitable.

An additional problem with these types of studies is that many of the groups have different mean years of education or predicted premorbid IQ scores. A number of studies have controlled for this difference by regressing out years of education or IQ before analysing the predictive power of the test scores.7,8,25,32,34 Unfortunately, analysis of covariance is often not an appropriate method for providing statistical control over differences between groups in neuropsychological research, as it can remove the variance of interest that is associated with age or education factors.65 One final problem with these types of studies is that while some exclude people with depression, others do not. However, evidence relating to the additional effect of depression on risk of developing dementia is equivocal.61,66 Moreover, in this study, mean IQ, years of education and mean number of depressive symptoms did not differ between the two groups.

It could be argued that any study predicting conversion from an outpatient population, such as that taken from a memory clinic will be problematic given the lack of generalisability to the general population of older adults. However, a large number of studies using both community and clinic based samples have failed consistently to identify a class or class of neuropsychological measures that produce satisfactory prediction rates. Thus, the issue seems less to be one of generalisability of the findings and more of generalisability of the method; which we feel has yet to be established.

Perhaps, it is currently ambitious to try to predict long term outcome from assessment at a single time point. Indeed, a number of studies have attempted to evaluate the predictive power of change scores.33,62,67 While some studies have found evidence of significant changes in test performance up to five years before diagnosis,67,68 others have shown marked changes in test performance only shortly before the diagnosis is made.26,33 Furthermore, at least one set of researchers have questioned the validity of any predictive studies.62

A range of alternative avenues for predicting conversion to dementia are currently being explored in the literature (see Almkvist and Winblad69 for a review). These include volumetric analyses of medial temporal lobe and hippocampal structures,20,70–75 functional neuroimaging analyses of blood flow,5 or of cortical glucose metabolism,37 genetic risk factors (for example, apoE4,)15,35,36,76–78 biological markers in cerebrospinal fluid,1 and nerve growth factor.32 Indeed, Collie and Maruff5 and Reischies and Hellweg78 recommended a convergent approach, combining measures (see for example, Albert74 and Daly et al36). While a convergent approach may improve predictive validity, it makes predictions of this type even further beyond the reach of the average geriatric outpatient service. In addition, it does not avoid the serious difficulties identified with determining what is normal for cognitive or neuroanatomical measures for any particular age group.

While it is tempting to seek a combination of neuropsychological and clinical measures that can predict the development of dementia, analysis of our clinical data suggests that it would be more fruitful to expend our energies on searching for a new, specific and unique cognitive, biological or anatomical marker, than to pursue these types of predictive models. Unfortunately, statistical significance does not, as yet, offer clinical utility.


We would like to thank Anthony Hughes and Lucie Byrne for statistical and methodological advice, and Maggie Agg for database support. Romola Bucks was supported by the charity, Bristol Research into Alzheimer’s and Care of the Elderly (BRACE). The Chinese Government sponsored Professor Jinzhou Tian. We are grateful to the reviewers for their helpful comments on an earlier version of this article.



  • * The index quantifies the mean proportion of 12 words learned on trials 2 and 3 as a function of how much information the participant has left to learn after trials 1 and 2. The equation for this index is


    Normative data were generated from a sample of healthy controls n = 37 aged 57–69 and n = 46 aged 70+ and are available from the second author.

  • Competing interests: none declared.

  • See Editorial Commentary 413

Linked Articles