We sought to identify the cognitive tests that best discriminate between Alzheimer’s disease (AD) and frontotemporal dementia (FTD). A comprehensive search of all studies examining the cognitive performance of persons diagnosed with AD and FTD, published between 1980 and 2006, was conducted. Ninety-four studies were identified, comprising 2936 AD participants and 1748 FTD participants. Weighted Cohen’s d effect sizes, percentage overlap statistics, confidence intervals and fail-safe Ns were calculated for each cognitive test that was used by two or more studies. The most discriminating cognitive tests were measures of orientation, memory, language, visuomotor function and general cognitive ability. Although there were large and significant differences between groups on these measures, there was substantial overlap in the scores of the AD and FTD groups. Age, education, years since diagnosis and diagnostic criteria did not significantly contribute to the group differences. Given the large overlap in the test performance of persons diagnosed with AD and FTD, cognitive tests should be used cautiously and in conjunction with a medical history, behavioural observations, imaging and information from relatives when making differential diagnoses.
- ACE, Addenbrooke’s Cognitive Examination
- AD, Alzheimer’s disease
- AVLT, Auditory Verbal Learning Test
- FTD, frontotemporal dementia
- MMSE, Mini-Mental State Examination
- Nfs, fail safe N
- %OL, percentage overlap
- WAB, Western Aphasia Battery
- WLS, weighted least squares
- WMS, Wechsler Memory Scale
Statistics from Altmetric.com
- ACE, Addenbrooke’s Cognitive Examination
- AD, Alzheimer’s disease
- AVLT, Auditory Verbal Learning Test
- FTD, frontotemporal dementia
- MMSE, Mini-Mental State Examination
- Nfs, fail safe N
- %OL, percentage overlap
- WAB, Western Aphasia Battery
- WLS, weighted least squares
- WMS, Wechsler Memory Scale
Dementia is an important social and health issue in our ageing population, with the prevalence of Alzheimer’s disease (AD) in the USA predicted to increase by approximately 300% by 2050.1 Two forms of dementia, AD and frontotemporal dementia (FTD), and their differential diagnosis using cognitive tests, are addressed in the current study.
AD is the most prevalent and well researched type of dementia.2 People with AD usually exhibit a progressive decline in general cognitive functioning,3 together with acalculia, apraxia, visuospatial problems, poor orientation and impaired language comprehension, attention and memory.2,4–6 Memory problems are one of the earliest symptoms, with the ability to learn and retrieve information being affected initially and impairments in recognition memory developing later.2,3,7
FTD, on the other hand, is variously referred to as Pick disease, dementia of the frontal type, frontal lobe dementia of the non-Alzheimer’s type, frontotemporal lobar degeneration, semantic dementia and progressive non-fluent aphasia.8,9 FTD refers to a group of degenerative dementias that are characterised by atrophy of the frontotemporal cortex.5,8,10 Although people with FTD may also experience memory problems, these deficits are more frequently associated with a tendency to respond impulsively and a failure to monitor performance.5,9 As with AD, retrieval of information from memory is affected. However, unlike AD, the provision of cues can assist performance.9 Persons with FTD also have more impairments in executive functioning2–6,9,11,12 and language (eg, inability to repeat, stereotyped phrases, reduced speech).5,7,9 Finally, although general cognitive ability may be unaffected in the early stages of FTD, it does decline over time.4,12
While definitive diagnoses of AD and FTD can only be made at autopsy, interim diagnoses usually take into account the base rates of the disorder, clinical criteria, medical history, physical examination, brain imaging and neuropsychological assessments.6,10,12 Unfortunately, the diagnosis of FTD can be difficult because of its insidious and gradual onset8,9 and can also be misdiagnosed as AD.13,14 However, accurate differential diagnosis has become increasingly important because of the recent availability of pharmacological treatments that temporarily improve the cognitive and functional abilities of people with AD.15–21 Moreover, delays in the commencement of these treatments significantly reduce their benefits,19,22 highlighting the importance of early and accurate diagnosis. In contrast, no specific pharmacological treatments currently exist for FTD, although research is continuing.23
If neuropsychological assessments are to make a valuable contribution to the diagnosis of dementia, clinicians need to know what cognitive tests best discriminate between the different types of dementia. Despite the differing clinical characteristics of these two disorders and the wide variety of tests that have been researched with AD and FTD samples, research has yielded inconsistent findings, both in terms of the cognitive constructs (eg, memory, language) and the specific tests that appear to differentiate between AD and FTD.14,24–33 These inconsistencies may, in part, be due to methodological differences in the research literature, such as variations in the participants’ age, time since diagnosis, stage of dementia and diagnostic criteria. A recent narrative review of the literature concluded that there are no tests that accurately discriminate between AD and FTD.34 However, a review of this type cannot adequately consolidate and analyse the data or reconcile differences in methodology.35 A meta-analysis, in contrast, provides an objective and quantitative means by which to directly compare existing findings, thereby providing an important addition to the existing literature.36,37 The current study therefore undertook a meta-analytic review of research that has compared the cognitive performance of AD and FTD samples in order to identify the tests that best discriminate between AD and FTD and, therefore, are most likely to be of use when making a differential diagnosis.
Literature search and inclusion criteria
A comprehensive search of the PubMed and PsycINFO electronic databases from January 1980 to November 2006 was undertaken to identify all published studies that assessed the cognitive functioning of AD and FTD samples. The key search terms that were used to capture relevant articles are shown in table 1. The bibliographies of relevant papers were also examined for additional references. To be selected for the current meta-analysis, a study had to meet the following inclusion criteria: (1) it examined groups of participants with AD and FTD (excludes case studies), (2) cognitive tests were administered to both groups (excludes rating scales and questionnaires), (3) the cognitive tests were not used for the diagnosis and classification of participants into the AD and FTD groups (ie, a test could not be used as both a dependent and independent variable), (4) statistical data enabling the calculation of Cohen’s d effect sizes38 was provided (eg, mean and SD, t tests or one way F tests) and (5) it was published in English.
This literature search yielded 2785 potentially relevant studies, 93 of which met all of the inclusion criteria. These 93 studies used a total of 136 different cognitive tests which, in turn, yielded results for 1019 different test scores (ie, some tests yielded multiple scores). A total of 114 of these test scores were used by more than one study and were, therefore, included in the current meta-analysis. Of the studies that did not meet one or more of the inclusion criteria, 2687 did not undertake cognitive testing with the AD and FTD samples, 67 did not provide data that would enable the calculation of effect sizes and 96 studies only used cognitive tests for the diagnosis and classification of participants. An additional paper39 was excluded because it presented data that had been reported in another publication that was to be included in this study.40 This was done to ensure that the data were independent, as meta-analytic techniques assume that a single study only contributes one score to the calculation of a particular effect size.41 Finally, one publication was treated as two studies because data were provided for separate samples in the one article.42 Thus 94 independent studies were effectively included in the final meta-analysis.
Data collection and preparation
Consistent with the most commonly used diagnostic criteria for AD,43 the Mini-Mental State Examination (MMSE) was frequently used when diagnosing dementia. The MMSE was therefore only included as a dependent variable in the current meta-analysis when groups were formed on the basis of a post-mortem diagnosis of AD or FTD dementia. However, where provided, MMSE scores are reported as descriptive data for the AD and FTD samples (refer to table 2). In addition, whenever it was not clear whether a particular cognitive test was used to diagnose dementia and classify participants, this test was excluded from the meta-analysis in order to ensure that a measure was not used as both a dependent and independent variable.
Where test scores were obtained from different editions of the same test (eg, Wechsler Adult Intelligence Scales), these were combined for the purposes of calculating mean effect sizes. Care was taken to ensure that the scores for a given test provided independent measures of performance. Thus subscores and total scores from a test could not both be used in the calculation of an effect size. Similarly, where a study provided multiple scores for the same test (eg, easy and hard versions of a geometric figure task44), effect sizes were calculated for each score and then averaged to provide a single effect size for that study; thereby ensuring that a study only contributed one score to the calculation of a mean effect size. This was not necessary when effect sizes for different scores from the same test were reported separately (eg, speed and accuracy).
All tests were broadly grouped into seven cognitive categories in order to organise the findings of this meta-analysis. Lezak et al45 and Spreen and Strauss46 were consulted for this purpose. These categories are: orientation and attention, memory, construction, verbal abilities and language, concept formation and reasoning, executive functioning and motor tasks, and general ability and intelligence.
Effect size calculations and analyses
Cohen’s d38 effect sizes were calculated to measure the extent of the difference between the scores of the AD and FTD samples, where a small effect size is defined as d = 0.2, a medium effect as d = 0.5 and a large effect as d = 0.8. To put this into perspective, an effect size of 0.5 indicates that the scores for the two groups differ by half of a pooled standard deviation. The percentage overlap between the scores of the two groups can also be calculated from an effect size47; d = 0 equates to 100% overlap (ie, the groups are indistinguishable), d = 1.0 equates to 45% overlap and d = 3.0 equates to less than 5% overlap in the groups’ scores. These statistics are provided for all effect sizes in order to facilitate their interpretation, especially with regard to their potential for assisting with differential diagnosis.
Effect sizes were calculated in a multi-stage process. The first stage involved calculating effect sizes for each score of every test that was used by each individual study. When calculating effect sizes, FTD scores were always subtracted from AD scores. In most cases, higher test scores indicated better performance. Therefore, a positive effect size indicates that FTD participants were more impaired on a given measure than AD participants. In cases where a higher score indicated greater impairment than a lower score (eg, errors, speed), the direction of the effect sizes for these scores was transformed so that a positive effect size still indicated greater impairment in the FTD group.
The effect sizes for all studies that used a particular test score were aggregated to calculate a mean effect size (and SD). Whereas mean effect sizes measure the extent (and direction) of the difference between AD and FTD, the SD indicates the variation in effect sizes between studies. Before aggregating the scores from individual studies, each effect size was weighted to take into account the fact that the reliability of an effect size is affected by the size of the sample from which it is derived. According to Lipsey and Wilson,48 the inverse of the variance provides a better measure of the precision of an effect size than sample size. Consequently, the inverse of the variance was used to weight all effect sizes (dw) using the formulae below.
where ES = effect size,
and SE = standard error.
Effect sizes are only reported for tests that were used by two or more studies because effect sizes that are based on a single study are not thought to provide a reliable measure of group differences.41 Ninety-five per cent confidence intervals (CIs) for these mean effect sizes were additionally calculated in order to determine their statistical significance. If a CI does not span zero, this indicates that the true population effect size differs significantly from zero, indicating that there is a significant difference between the performance of the AD and FTD groups.
One problem facing all meta-analyses is that studies with statistically significant results are more likely to be published and, therefore, more likely to be included in a meta-analysis. Failure to include unpublished studies with non-significant results increases the risk of a type 1 error, which may result in an effect size being overestimated.49 A fail safe N (Nfs) was therefore calculated, using the method described by Rosenthal,41 to address this problem. This statistic estimates the number of unpublished studies, with non-significant results (ie, small effect) that would be required in order to call the current findings into question.41 The larger the number, the more confident we can be in a particular finding.
An additional problem that faces meta-analytic studies (and qualitative reviews) is that there are methodological differences between studies that may affect the findings and, therefore, the effect sizes. The impact of diagnostic certainty was therefore examined in order to determine whether the research findings were affected by the criteria that were used to establish dementia type. Every study was given a score according to the diagnostic criteria that were used to define each of their samples. As post-mortem neuropathological confirmation of AD or FTD provides the gold standard for a diagnosis of dementia, this was given a score of three (NAD = 5, NFTD = 5 studies), while studies using the National Institute of Neurologic, Communicative Disorders and Stroke-AD and Related Disorders Association (NINCDS-ADRDA) diagnostic criteria for AD43 and the Lund and Manchester criteria for FTD50 were given a score of 2 (NAD = 82, NFTD = 74 studies). Studies that used revised or similar criteria, particularly prior to the publication of these guidelines, were also given a score of 2. Finally, a score of 1 was given to studies whose diagnostic criteria were less rigorous or not clearly specified (NAD = 7, NFTD = 15 studies). The diagnostic scores for studies therefore varied between 2 and 6.
The potential influence of three other methodological variables (participant age, years since diagnosis and education) was also examined. Mean age, years since diagnosis and years of education were calculated for each study (eg, MAD+FTDage) for this purpose. This was done by combining the data from the AD and FTD groups for that study and weighting it by the sample sizes of the AD and FTD groups.
While there is considerable confusion within the literature regarding the best way to analyse the influence of methodological variables on effect sizes, Steel and Kammeyer-Mueller51 have demonstrated that a weighted least squares (WLS) multiple regression (using the inverse of the sampling error variance to weight the multiple regression) is less affected by the problems of heteroscedasticity and multicollinearity, which can affect the accuracy of these analyses. Heteroscedasticity occurs when the distribution of study sample sizes is skewed (eg, a large number of studies with small samples and few with very large samples) and multicollinearity occurs when predictor variables are correlated with one another.51,52 A WLS multiple regression was therefore conducted, in addition to Pearson r correlation coefficients, in order to examine these methodological variables. Standard errors and confidence intervals for the results of the WLS multiple regression were adjusted according to the method outlined by Hedges.53
The demographic characteristics for the participants in the 94 studies that were included in the current meta-analysis are shown in table 2. Overall, 4684 participants were included in this study. Gender was only reported in 69 studies, providing data for 3480 cases (males: NAD = 917, NFTD = 737; females: NAD = 1253, NFTD = 573). When the demographic characteristics of the AD and FTD participants that were included in this meta-analysis were compared, the FTD participants were found to be significantly younger (t = 12.59, df = 86, p<0.001), more educated (t = −2.51, df = 60, p = 0.015) and had significantly higher MMSE scores (t = −5.41, df = 74, p<0.001) than the participants with AD. However, the two groups did not differ significantly in terms of the time since their diagnosis (t = 0.38, df = 36, p = 0.709) or Clinical Dementia Rating scores (t = 0.25, df = 18, p = 0.807).
Five studies had pathological confirmation of AD,54–58 82 used published criteria,14,25,27–31,40,42,44,59–129 and seven did not specify diagnostic criteria.130–136 For the diagnosis of FTD, five studies had pathological confirmation,54–58 74 used published criteria,14,25,27,28,30,31,40,42,44,59–64,66–73,76–105,108–116,118,120–124,126–129,131 14 did not clearly specify the diagnostic criteria29,65,74,75,106,107,117,125,130,132–136 and one study used a combination of clinical observations and published criteria.119
There were not enough studies reporting data for premorbid intelligence (n = 6) or scores on the Dementia Rating Scale (n = 6) to report or analyse this information. Similarly, too few studies reported information about participants’ current medications (n = 19), much of which was not comparable, to meaningfully examine this information. Thirty-five studies reported information regarding the inclusion or exclusion of depressed participants. However, only six of these provided scores from depression rating scales.
The weighted effect sizes (dw) for all measures (mean, SD, 95% CI, minimum, maximum), grouped according to test category (orientation and attention, memory, construction, verbal abilities and language, concept formation and reasoning, executive functioning and motor tasks, and general ability and intelligence) and rank ordered by size, are provided in tables 3–10. Nfs and the percentage overlap between groups (%OL) are also provided, as are the number of studies (N) that used each measure, the number of participants that were assessed and the study references. The conclusions drawn from this study are based on the combined interpretation of these statistics. From a neuropsychological perspective, differential diagnoses are likely to be more accurate when there are large group differences (d) and there is limited overlap in the cognitive profiles (%OL). A clinician would also be more confident in their decision if the CI did not span zero (ie, groups differ significantly) and it is unlikely that the possibility of unpublished findings would draw a diagnosis into question (Nfs). Thus in order for a measure to be useful for differentiating between AD and FTD dementia, it had to meet the following criteria: (i) have a large effect size (ie, dw ⩾ 0.8) and, consequently, a limited degree of group overlap (%OL <50), (ii) a confidence interval that did not span zero (CIs are affected by N and SD/range) and (iii) an Nfs score that was large enough to make it unlikely that there were that number of unpublished studies with non-significant findings in existence. As different tests were used with varying frequency, it was decided that the Nfs should be greater than the number of published studies that had used the test (ie, Nfs >Nstudies).
Overall, the effect sizes ranged from a minimum of −0.05 for the Judgement of Line Orientation Test to a maximum of 1.39 for the Graded Naming Test. Thus there was considerable variation in the extent to which the AD and FTD groups differed on the measures used by the 94 studies that were included in the current meta-analysis. This is also reflected in the per cent overlap statistics, which indicate that there was 100% overlap between the AD and FTD groups on the least discriminating measure (Judgement of Line Orientation Test) and 32% overlap between the two groups for the most discriminating measure (Graded Naming Test). Also interesting is the fact that the cognitive tests reported here were used by between two (eg, Addenbrooke’s Cognitive Examination (ACE) subtests, Western Aphasia Battery (WAB) Comprehension) and 43 studies (Category Fluency). Thus while some measures have been used infrequently, others have been studied very extensively (eg, Category Fluency, Letter Fluency, Rey Complex Figure, Boston Naming Test). In terms of the statistical significance of the effect sizes, as indicated by CIs that do not span zero, there were 69 effect sizes that differed significantly from zero.
Of the 18 measures of orientation and attention, only the ACE orientation score met the study criteria (see table 3). This measure had a large and significant effect size, reflecting 41% overlap in the scores of the AD and FTD groups, with the AD group performing more poorly than the FTD group on this measure. The Nfs for this measure was also well in excess of the number of published studies using this measure, suggesting a high degree of confidence in this finding.
Only five measures of perception were used by two or more studies and these were all taken from the Visual Object and Space Perception battery (see table 4). None of these measures produced large effects, although position discrimination and dot counting revealed moderate group differences and CIs that did not span zero. However, these measures were associated with more than 50% overlap between the two groups and both had modest Nfs, indicating that the effect sizes could be reduced to a small effect size if there were 5–7 unpublished studies that had found small group differences. Therefore, none of the measures of perception accurately discriminated between persons with AD and FTD.
Memory was one of the most commonly assessed cognitive domains. In particular, the Rey Complex Figure Test and the Logical Memory subtest from the Wechsler Memory Scale (WMS) were very commonly used (see table 5). The recognition and delayed recall scores of the Auditory Verbal Learning Test (AVLT), the delayed recall trial of the Rey Complex Figure task, the Recognition Memory Test (words), the retention and delayed recall scores of the Logical Memory subtest from the WMS, the memory score from the ACE, the delayed recall score of the Visual Reproduction subtest from the WMS and the total recall score from the Selective Reminding Test all had large and significant effect sizes and less than 50% overlap between groups. Persons with AD performed more poorly than those with FTD on all of these measures. The delayed recall of the Rey Complex Figure task, in particular, has been used widely (21 studies) and the Nfs suggest that it is unlikely that this result would be negated by non-significant unpublished findings. The recognition score of the AVLT, the Recognition Memory Test for words, the retention score of the Logical Memory subtest from the WMS and the memory score of the ACE have been less widely used and, while acceptable, their Nfs were substantially lower.
Regarding tests of verbal abilities and language, there were more tests that fell into this category than any other. Eight of these test scores yielded large effect sizes, 95% CIs that did not span zero and large Nfs statistics, indicating that there were large and significant group differences on these measures and that these findings are unlikely to be negated by unpublished findings (see table 6). Specifically, these measures were: the Graded Naming Test, Word–Picture Matching, WAB Spontaneous Speech (fluency), the Pyramids and Palm Trees (word score, picture score), other picture naming tasks, WAB Comprehension and WAB Repetition. For all eight verbal measures, the FTD groups performed more poorly than the AD group. In addition to these eight measures, there were a number of other commonly used measures, such as the Pyramids and Palm Trees (score not specified) and letter fluency tasks, which showed moderate effect sizes, had 95% CIs that did not span zero and large Nfs statistics. However, because the effect sizes for these measures were relatively low (indicating small to medium differences in the AD and FTD means), there was an unacceptably high degree of overlap (53% and 67%, respectively) between the scores for the AD and FTD groups.
Of the seven measures of construction that were used by two or more studies, only the Beery Developmental Test of Visual Motor Integration demonstrated a large group difference, less than 50% overlap in scores, 95% CIs that did not span zero and a satisfactory Nfs (see table 7). As with the abovementioned memory tests, the effect size was negative, indicating that the AD groups performed more poorly on this task than the FTD groups.
Despite researchers using a number of different measures of concept formation and reasoning, none of them showed large group differences (ie, d >0.8) (see table 8). Indeed, there was between 57% and 92% overlap in the scores of the AD and FTD groups on all 12 measures. Commonly used measures, such as the Wisconsin Card Sorting Test and Raven’s Progressive Matrices, did not successfully discriminate between AD and FTD when the findings of all studies that had used these measures were considered, although the minimum and maximum effects sizes for these tests indicate that there were individual studies that reported large effects for these tests.
When the classifications used by Lezak et al45 and Spreen and Strauss46 were used to group the cognitive tests, there were only four tests that fell into the category of executive function and motor performance (see table 9), although other tests that are frequently considered to be executive tasks have been included in other categories (eg, verbal fluency tasks, Wisconsin Card Sorting Test, Wechsler Adult Intelligence Scale-Similarities).137,138 None of these met the criteria specified for discriminating between AD and FTD.
Finally, of the five measures of general ability and intelligence that were used by the studies included in this meta-analysis, the MMSE was the only measure that revealed a large negative effect (see table 10). This indicates that AD participants were more impaired on this measure than FTD participants. Despite its widespread use, only those studies (n = 5) that had pathological confirmation of a patient’s diagnosis were used in the calculation of this effect size. This was done in order to avoid a situation where the MMSE was being used both in group allocation (independent variable) and as a measure of cognitive performance (dependent variable).
The influence of four methodological variables (age, years since diagnosis, education, diagnostic criteria) on the effect sizes was also examined using Pearson r correlation coefficients and a WLS multiple regression analysis. “Diagnostic criteria” refer to the rating given to the quality of the diagnostic criteria that was used to diagnose AD and FTD (ie, pathological confirmation, published criteria or other criteria). The mean age, years since diagnosis, education and diagnostic criteria scores of those studies that reported this data were firstly correlated with the weighted mean effect sizes for these studies. Small and non-significant correlations were observed for age (r = −0.01, n = 90, p = 0.94), years since diagnosis (r = −0.10, n = 38, p = .57), education (r = 0.11, n = 63, p = 0.41) and diagnostic criteria (r = −0.08, n = 94, p = 0.46), indicating that these variables were not significantly related to the effect sizes. A WLS multiple regression was additionally performed on the data from the 22 studies that reported data for all four variables. The weighted mean effect size for a study was the dependent variable, the four moderator variables were the independent variables, and the inverse variance was used as the weighting variable. The results of this regression analysis are presented in table 11. The final model was non-significant (p = 0.98) and accounted for only 23% of the variance. Therefore, both the correlations and the WLS multiple regression indicate that age, years since diagnosis, education and diagnostic criteria did not significantly contribute to the effect sizes reported in this study. Thus there did not appear to be any systematic variation in the performance of the AD and FTD groups on the different tests of cognitive ability as a consequence of between study differences in these variables, suggesting that it is acceptable to combine the results of studies that differed on these methodological variables.
Overall, the data for this meta-analysis were obtained from 94 studies that examined the cognitive performance of 2936 persons with AD and 1748 persons with FTD. Forty-seven cognitive tests, which yielded 115 different scores, were used by more than one study. While the participants with AD were, on average, older and less educated than the FTD participants, the two groups were comparable in terms of time since diagnosis. The lower age of the FTD group is consistent with the fact that FTD tends to have an earlier onset.14 FTD participants also had higher MMSE scores than the AD participants. However, this difference in MMSE scores is likely to reflect the fact that this measure was originally designed to detect the deficits associated with AD (eg, memory, praxis) rather than FTD.
The current meta-analysis found that a wide variety of tests have been used in research that has examined the cognitive deficits associated with AD and FTD and that there is considerable variation in the ability of these measures to distinguish between AD and FTD. However, in order for a cognitive test to be useful for differential diagnosis, it was argued that it must be able to distinguish between the performance of these two groups (indicated by large effects and small %OL). There should also be a high degree of confidence in the accuracy with which the measure distinguishes between the two groups (measured by the 95% CI) and we need to be assured that the conclusions drawn from the research literature are not systematically biased by the tendency to publish statistically significant findings (Nfs statistic). When these three criteria were applied to all of the measures analysed in this study, there were only 19 out of 115 measures that met these criteria. More specifically, persons with AD performed more poorly than those with FTD on 12 of the 19 measures, all of which assessed orientation, memory or general cognitive ability. These tests were (in order of discriminative ability): the ACE orientation subtest, the MMSE, the Beery Developmental Test of Visual Motor Integration and nine measures of memory (AVLT recognition and delayed recall scores, Rey Complex Figure delayed recall score, Recognition Memory Test (words), WMS Logical Memory per cent retention and delayed recall scores, ACE memory subtest, WMS Visual Reproduction delayed recall score and Selective Reminding Test total recall). In contrast, persons with FTD performed more poorly than those with AD on seven measures of verbal ability (Graded Naming Test, Word–Picture Matching, WAB Spontaneous Speech (fluency), Pyramids and Palm Trees (word score, picture score), other picture naming tasks and WAB Comprehension). Importantly, however, there was still between 32% and 48% overlap in the scores of the AD and FTD groups on all 19 of these measures. This level of overlap suggests that, even when the most discriminating cognitive tests are used, the differential diagnosis of AD and FTD remains problematic. Failure to find tests that clearly discriminate between AD and FTD confirms previous research which found that the performance of AD and FTD participants did not differ significantly on a large range of tests.27,44,68,72,94,130
The finding that there were nine measures of memory that were among the most useful for differentiating between AD and FTD is consistent with research and clinical observations that memory is differentially affected in AD and FTD. Numerous previous studies have reported that AD patients experience greater memory problems than those with FTD.25,27,28,30,44,66,68,72,78,84,85,90,91,94,112 In contrast, persons diagnosed with FTD had more difficulty with tests of verbal ability and language than those with AD, as demonstrated by eight large effect sizes in this domain. This supports previous research which indicated that FTD is associating with language impairment.5,7,9 In addition, numerous previous studies have reported that persons with FTD have more difficulty with executive tasks than AD patients.2,3,4,5,9,11,12,27,67 While executive tasks would be expected to discriminate between AD and FTD, given the frontal pathology associated with FTD,9,10,11 this meta-analysis did not identify any measures of executive functioning that adequately distinguished between the two groups. Moreover, other tests that are thought to draw on executive functions for successful completion,137,138,139 such as verbal fluency, Trails and the Stroop tests, failed to meet the current study criteria.
While there was considerable variation in the effect sizes for different studies, between study variations in the age of participants, years since diagnosis, years of education and the diagnostic criteria that were used were not related to the size of the effects reported by these studies. The finding that years since diagnosis was not related to the group differences is surprising given that the symptoms associated with both AD and FTD vary with the stage of illness, with cognitive performance declining over time.4,11,12,43,140,141,142 However, demographic variables such as age, gender, years since diagnosis and education were not consistently reported. For example, only approximately 40% of the studies reported years since diagnosis. Thus there is the potential for significant but unreported variation in these variables. Also important is the fact that mean effect sizes were calculated for each study in order to complete these analyses. Thus the effect sizes for all of the individual measures that were used by a given study were averaged. If a study yielded both small and large effect sizes for different tests, these were combined to calculate a mean effect size, which was then correlated with the moderator variables. A more useful analysis would involve analysing each measure separately by correlating the effect sizes from all of the studies that used a particular measure with the age, education and years since diagnosis data for those studies. However, this was not possible as the maximum number of studies that used any given measure was 43. When those studies that did not provide data for the moderator variables were excluded, the sample size was reduced to an unacceptably low number (ie, 22). Thus the current moderator analysis was necessarily limited. Similarly, current medications and whether depressed participants were excluded was rarely reported. Therefore, these variables could not be examined, despite the potential for medications and depression to affect cognitive performance. It is possible that these factors contributed to the effect sizes. It is, therefore, essential that this information is included in future publications of primary research to allow an accurate and detailed synthesis of the research findings.
There are a number of limitations to the current meta-analysis that warrant consideration. Firstly, rating inventories were not included in the current meta-analysis as they are not classified as objective cognitive tests.143,144 However, given the behavioural nature of FTD, a meta-analysis of these behavioural rating scales may yield useful findings. In addition, assessments of social cognition, personality and empathy measures, which have been recommended in the differential diagnosis of dementia,11,13,64,145 did not fall within the scope of the current meta-analysis but may also be worthy of consideration.
Secondly, it is important to note that the current meta-analysis specifically examined AD and FTD. The overlapping cognitive features of these types of dementia are highlighted by the fact that few tests clearly discriminated between these two groups. While greater differences would be expected if the cognitive performance of the AD and FTD groups were compared with healthy controls, such an analysis would not adequately address the question of differential diagnosis.
Thirdly, FTD is described using a variety of terms, which were combined in the current meta-analysis. This may have increased the variability in the test performance of the FTD group. Although it would be desirable to analyse these subgroups separately, the labels and the diagnostic criteria that are applied to these subgroups are not used consistently, thereby precluding a more specific analysis of potential subtypes. Moreover, more research is needed on each of these subgroups if they are to be analysed separately.
While this analysis indicates that many test scores do not clearly discriminate between AD and FTD, there is some evidence that qualitative aspects of performance may better distinguish the two groups. For example, a recent study indicated that the consideration of performance characteristics and specific error types increased the accuracy of the differential diagnosis of AD and FTD.33 However, given that these aspects of performance are often not quantified, discrimination between AD and FTD is largely reliant on subjective clinical observations.
Finally, the effect sizes derived from measures used by single studies were not included in the current meta-analysis. Technically, a meta-analysis requires two or more studies in order to aggregate primary research findings.41 One consequence of this is that there may be interesting and innovative measures that discriminate between groups but are not in widespread use and were not included in the current analysis.
In summary, the findings of this meta-analysis suggest that the neuropsychological tests that best discriminate between AD and FTD are ACE orientation, ACE memory, recognition and delayed recall of the AVLT, delayed recall of the Rey Complex Figure, the words version of the Recognition Memory Test, WMS Logical Memory (per cent retention and delayed recall), WMS Visual Reproduction Test (delayed), total recall of the Selective Reminding Test, Graded Naming Test, Word–Picture Matching, WAB Spontaneous Speech (fluency), Pyramids and Palm Trees (word score, picture score), picture naming tasks, WAB Comprehension, the Beery Developmental Test of Visual Motor Integration and MMSE. However, it is important to note that none of the tests showed acceptably low overlap between the scores of the two groups to confidently make a differential diagnosis. Therefore, these cognitive tests must be used cautiously and in conjunction with other diagnostic information, such as medical history, behavioural observations, imaging and information from relatives when making a differential diagnosis.
We would like to acknowledge the statistical expertise of Mr B Willson and Dr M Hutchinson in this research. The preliminary findings from this study were presented at the International Neuropsychological Society’s mid-year conference, Dublin, Ireland, July 2005.
Published Online First 19 March 2007
Competing interests: None.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.