More information about text formats
We read Larrabee and colleagues’ e-letter response to our systematic review on Performance Validity Testing (PVT). Whilst we welcome debate, and we recognize that some clinicians will disagree with our conclusions, we were disappointed that they misrepresented our paper in formulating their response:
1. The authors state “Throughout the paper, the authors refer to PVTs as “effort tests”, a characterization that is no longer in use in the United States..”. In reality we used the term “effort test” only twice in our paper; in the introduction: “(PVTs), also historically called effort tests” and once in the methods in describing our search terms. By contrast we use the term PVT on 45 occasions.
2. We are concerned that they then go on to misrepresent the results of our review. We found a wide variation in results in different clinical groups and in different tests. We noted that failure rates for some groups and some tests exceeds 25%. We did not conclude that all failure rates were as high as this, but rather that failing a PVT was not a rare phenomenon but was reasonably common in a range of clinical groups.
We presented results to support our conclusion that the PVT literature is problematic with regards to blinding to diagnosis and potential for selection bias.
We also uphold our speculation that an alternate explanation for failure on forced choice tests at above chance cutoffs may result from attentional deficit related to other symptoms. W...
We also uphold our speculation that an alternate explanation for failure on forced choice tests at above chance cutoffs may result from attentional deficit related to other symptoms. We were explicit that there are likely to be many reasons that a person might fail a PVT, and invite further research and discussion of the various causes of PVT failure that might help to explain the observed variation in failure rates across a range of diverse clinical conditions.
3. We did not ignore the importance of using multiple PVTs. On the contrary we explicitly stated “the manner in which we have described PVT failure rates does not necessarily reflect how they are used in practice by skilled neuropsychologists” and that “Guidance documents recommend that multiple performance validity measures should be used, including both free-standing and embedded indicators ...” But that was not the subject of the study which was the performance of individual tests.
4. They allege that we did not mention any of the previous published meta-analyses summarizing data from multiple investigations. However, the Sollman and Berry review they suggest has a quite different focus from our study, and the majority of studies included were of mixed clinical populations (e.g. ‘psychiatric’, or ‘neurological’ or ‘head injury’ without severity specified)(1). It was not relevant to our question.
Larrabee et al. raise three valid points of criticism:
1. That we were inaccurate in our portrayal of the Novitski paper. We have reviewed the Novitski et al. paper and although % fail rate was accurately transcribed we agree the study was erroneously included, as the included mTBI patients had already failed the WMT(2). Although interestingly, 36% of the amnestic MCI sample in this paper (who were not administered the WMT) also failed RBANS digit span <9. Of note, the mTBI group in the Novitski paper was not included in our Figure 2, due to the rather higher-than-usual cutoff score of 9. This isolated error does not alter our overall conclusions.
2. They challenge our statement that there is little consensus amongst experts on the use of PVTs. However Kemp et al, in further correspondence with us, stated explicitly that the view of these tests as ones of malingering was old fashioned and largely no longer accepted, whereas Larrabee and colleagues refer to them as just that. The British Psychological Society refer to them as ‘effort’ tests throughout their guidance whereas Larrabee and colleagues criticize such nomenclature. It seems to us even from the responses to our paper that there is a lack of consensus.
3. Finally, while Larrabee et al. report that our results are “not representative of the research database” we counter with an assurance that this data was just that: a systematic extraction of all available date from the research base on PVTs.
1. Sollman MJ, Berry DTR. Detection of Inadequate Effort on Neuropsychological Testing: A Meta-Analytic Update and Extension. [cited 2020 Jan 15]; Available from: https://academic.oup.com/acn/article-abstract/26/8/774/4085
2. Novitski J, Steele S, Karantzoulis S, Randolph C. The Repeatable Battery for the Assessment of Neuropsychological Status Effort scale. Arch Clin Neuropsychol [Internet]. 2012;27(2 PG-190–195):190–5.
We read with interest Kemp and colleagues response to our recent systematic review on Performance Validity Testing (PVT). In response to specific criticisms raised:
1- The searches and data extraction were conducted by one investigator. We agree this is a potential limitation although only if papers were missed, or data was erroneously transcribed, and it can be demonstrated this would have changed the conclusions. Although Kemp and colleagues place great weight on this, the evidence they put forward to support their contention was limited. Of the four citations in their letter, reference 2 and reference 4 were in fact included (see our supplementary tables and our reference 57)(1,2). Reference 3 was, by coincidence, published simultaneously with our manuscript submission and was not available to us(3).
Reference 1 did not fit the terms of our search strategy and was not included although it would have been eligible(4). It was an unblinded study of the ‘coin in hand test’, a brief forced choice screening test for symptom exaggeration, administered to 45 patients with mixed dementias. It found 11% scored at or above a 2 error cut off and the authors proposed a new set of cut offs for interpretation; it was in keeping with our conclusions. We’d be happy to consider any other specific omissions or quality assessment issues not discussed which the authors consider would have altered the conclusions of the review.
2- The authors criticise our understanding...
2- The authors criticise our understanding of how PVTs are used in clinical practice, stating that PVTs should not interpreted on a stand-alone basis but in combination as part of a wider assessment. We agree and made that point explicitly in the paper. Nonetheless an understanding of the accuracy of individual tests remains of key importance in the weight accorded to individual tests in that wider assessment. They state that the way we presented single test failure rates is not the way the tests should be used. Again, we agree and pointed this out in the paper. However, they also misrepresent us, as we did not score or interpret the tests ourselves, nor did we conflate different forms of testing - we reported all the available data as the authors presented it.
3- The third point they make is that we are not saying anything new. We agree in as much as we have methodically documented and grouped in one paper data that was in the public domain. In particular, the authors suggest that ‘base rate failure is well understood’ but in our experience that is not the case; indeed the British Psychological Society’s own guidelines state “Further evidence on UK base rates of cognitive impairment and failure on effort tests in a range of clinical presentations and service settings is needed”(2). There are no other papers that synthesise this data in clinical populations to give readers an overview of these base rates. More importantly, the evidence we found, showed a wide variety of use and interpretation of PVTs. Kemp and colleagues go on to describe how studies should ideally be done to compare clinical populations. We agree and discuss this in our closing remarks. The problem is that evidence to date falls far short of this ideal.
As we made clear in the paper, we agree that PVTs may be useful in the correct context and with an understanding of their limitations. We were not “dismissive” of the developing PVT literature and we believe this should be a literature open to scrutiny by all.
1. Sieck BC, Smith MM, Duff K, Paulsen JS, Beglinger LJ. Symptom validity test performance in the Huntington Disease Clinic. Arch Clin Neuropsychol. 2013;28(2 PG-135–43):135–43.
2. British Psychological Society. Assessment of Effort in Clinical Testing of Cognitive Functioning for Adults. 2009.
3. Sherman EMS, Slick DJ, Iverson GL. Multidimensional Malingering Criteria for Neuropsychological Assessment: A 20-Year Update of the Malingered Neuropsychological Dysfunction Criteria. Arch Clin Neuropsychol. 2020;00:1–30.
4. Schroeder RW, Peck CP, Buddin WH, Heinrichs RJ, Baade LE. The Coin-in-the-Hand Test and Dementia. Cogn Behav Neurol. 2012 Sep;25(3):139–43.
McWhirter et al. (2020) reviewed the published literature on Performance Validity Tests (PVTs), concluding that high false positive (Fp) rates were common in clinical (non-forensic) samples, exceeding 25%. In their discussion, they stated: “The poor quality of the PVT evidence base examined here, with a lack of blinding to diagnosis and potential for selection bias, is in itself a key finding of the review.” They also conclude that the use of a forced choice format with cut scores that are significantly above chance on two alternative forced choice tests (e.g., TOMM), raises questions about the utility of the forced choice paradigm, essentially characterizing these PVTs as “floor effect” procedures. As such, McWhirter et al. then argued that failure at above chance cutoffs represents “functional attentional deficit in people with symptoms of any sort,” rather than invalid test performance due to intent to fail.
Throughout the paper, the authors refer to PVTs as “effort tests”, a characterization that is no longer in use in the United States, in part because PVTs require little effort to perform for persons experiencing significant cognitive impairment (1). Rather, PVTs have been defined as representing invalid performance that is not an accurate representation of actual ability. Continuing to refer to PVTs as “effort tests” allows McWhirter et al. to more easily mischaracterize the tests as sensitive attentional tasks affected by variable “effort” rather than measur...
Throughout the paper, the authors refer to PVTs as “effort tests”, a characterization that is no longer in use in the United States, in part because PVTs require little effort to perform for persons experiencing significant cognitive impairment (1). Rather, PVTs have been defined as representing invalid performance that is not an accurate representation of actual ability. Continuing to refer to PVTs as “effort tests” allows McWhirter et al. to more easily mischaracterize the tests as sensitive attentional tasks affected by variable “effort” rather than measures of performance validity that are failed due to invalid test performance.
As clinicians and investigators in PVT research, we found errors in their study analysis, and insufficient discussion of the research on mitigating factors related to Fp errors. First, the test error rates reported by McWhirter et al. are not accurate. For example, the Novitski et al. paper (McWhirter et al. reference 26) was cited as showing a 52% Fp rate on RBANS Digit Span < 9 in mild traumatic brain injury (mTBI). However, Novitski et al. used this mTBI sample, who also failed the WMT, as the criterion group representing non-credible (invalid) performance. Consequently, 52% failure represents Sensitivity (to invalid performance) rather than the Fp rate. Importantly, McWhirter et al. do not mention any of the published meta-analyses summarizing data from multiple investigations. For example, Sollman and Berry (2) reported a Specificity of .90 corresponding to a 10% Fp rate, based on 47 samples comparing 5 PVTs, administered to 1,787 participants.
Larrabee (3), studying moderate and severe TBI groups, reported Fp rates ranging from .065 to .138 for 4 PVTs and 1 SVT in this TBI sample. Moreover, Larrabee found no differences in mean performance on these 5 validity measures for a group of primarily mTBI cases performing significantly < chance on a two alternative forced choice test, the PDRT, vs a similar group failing the PDRT at an above-chance cutoff, plus failing one additional PVT. The same was true for mean performance on sensitive measures of word-finding, processing speed, verbal and visual learning and memory. These data support the equivalence of definite invalid performance (significantly < chance) and probable invalid performance (defined by ≥ 2 PVT failures), contradicting the description by McWhirter et al. of PVT failure at above chance levels as representing a functional attentional problem. In other words, multiple PVT failure suggests intentional underperformance, provided there is no evidence of pronounced neurologic, psychiatric or developmental factors that could account for such failure, such as Alzheimer-type Dementia (AD), schizophrenia, or Intellectual Deficit. It is the combined improbability of multiple PVT and SVT results, in the context of an external incentive, without any viable alternative explanation, that establishes the intent (to fail) of the examinee (1).
McWhirter et al. ignored the importance of using multiple PVTs to improve diagnostic accuracy. Data from Loring et al. (see McWhirter, reference 8) showed dramatic reduction in Fp rates when PVTs are used in combination. For example, a Reliable Digit Span (RDS) score of ≤ 6 had a Fp of 13% in AD. A Rey AVLT Recognition score of ≤ 9 had an Fp of 70% in AD. Yet, by requiring both an RDS of ≤ 6 AND a Rey AVLT Recognition score of ≤ 9 the Fp rate was dramatically lowered to 5% for AD and 1% for amnestic MCI. Additionally, Loring et al. provided Fp rates (on RDS and AVLT Recognition) by levels of performance on Rey AVLT delayed free recall, and Trail Making B. These data showed levels of memory and processing speed that were sufficient to result in low Fp rates consistent with the presence of preserved native ability to perform validly on RDS and AVLT Recognition. These results support the widely held practice of employing multiple PVTs in the individual case in order to control the Fp error and enhance detection of invalid test performance (also see 4). As the rate of PVT failure increases, the likelihood of Fp identification decreases. Importantly, Bianchini et al. (5) showed that the rate of failure for 5 PVTs correlated with the degree of external incentive, demonstrating a dose effect corroborating a causal relationship between PVT failure and potential compensation.
Curiously, McWhirter et al. contend there is little consensus amongst experts as to how PVTs are used in the same paragraph that they reference the American Academy of Clinical Neuropsychology Consensus Conference Statement on the neuropsychological assessment of effort, response bias and malingering (see McWhirter reference 58). This document represents an evidence based assessment of performance and symptom validity that is currently under revision for publication. This revision supports the conclusions of the original consensus conference statement, with further information regarding the use of multiple PVTs and SVTs. These papers show there is substantial support for validated PVTs and SVTs, with low per-test Fp errors of 10% or less, and enhanced diagnostic accuracy gained through use of multiple validity measures.
In closing, the Fp rates reported by McWhirter et al. are not representative of the research database that characterizes modern PVT and SVT investigations. We agree with similar observations by our United Kingdom colleagues (Kemp et al.).
Glenn J. Larrabee, Ph.D. (USA)
Kyle B. Boone, Ph.D. (USA)
Kevin J. Bianchini, Ph.D. (USA)
Martin L. Rohling, Ph.D. (USA)
Elisabeth M. S. Sherman, Ph.D. (Canada)
1. Larrabee GJ. Performance validity and symptom validity in neuropsychological assessment. J Int Neuropsychol Soc 2012; 18:625-30.
2. Sollman MJ, Berry DTR. Detection of inadequate effort on neuropsychological testing: a meta-analytic update and extension. Arch Clin Neuropsychol 2011; 26: 774-89.
3. Larrabee GJ Detection of malingering using atypical performance patterns on standard neuropsychological tests. Clin Neuropsychol 2003; 17: 410-25.
4. Larrabee GJ. False-positive rates associated with the use of multiple performance and symptom validity tests. Arch Clin Neuropsychol 2014; 29: 364-73.
5. Bianchini KJ, Curtis KL, Greve KW. Compensation and malingering in traumatic brain injury: A dose response relationship? Clin Neuropsychol 2006; 20: 831-847.
Response to McWhirter et al (2020):
In their article, Performance validity test failure in clinical populations - a systematic review, McWhirter and colleagues (2020) present the ‘base rates’ of performance validity test (PVT) failure (or what are commonly referred to as effort tests) and offer an analysis of PVT performance from their perspective as neurologists and neuropsychiatrists.
As a group of senior practicing clinical neuropsychologists, we are pleased that they have drawn attention to an important issue, but we have significant concerns about the methodology used and with several of the conclusions drawn within the review. We present this response from the perspective of U.K. neuropsychology practice, and as practitioners involved in research and formulating clinical guidance on the use of PVTs. In preparing this response, we were aware of parallel concerns of our U.S. counterparts (Larrabee et al) but we have submitted separate responses due to the word limit.
The systematic review methodology used by McWhirter et al. has resulted in a limited number of papers being included, and there is no indication of the quality of the studies included. All of the literature search and analytic procedures appear to have been undertaken by one person alone, hence there was no apparent control for human error, bias, omission or inaccurate data extraction. Also, it is unclear to us to what extent McWhirter and colleagues had the knowle...
The systematic review methodology used by McWhirter et al. has resulted in a limited number of papers being included, and there is no indication of the quality of the studies included. All of the literature search and analytic procedures appear to have been undertaken by one person alone, hence there was no apparent control for human error, bias, omission or inaccurate data extraction. Also, it is unclear to us to what extent McWhirter and colleagues had the knowledge to determine what data constituted PVT failure, since no neuropsychologists appear to have been involved in their paper.
Whilst we welcome their scrutiny of PVT performance across a range of clinical settings and their drawing attention to the important matter of the base rates of failure without any obvious incentive to underperform at neuropsychological examination, this point is well understood in the existing literature and not in itself a novel finding. Most neuropsychologists will be familiar with such failures in their clinical practice, and these findings arise in a number of publications, including ones which McWhirter et al omitted to review1 2.
McWhirter et al.’s key conclusion is that in the case of PVT ‘failure rates are no higher in functional disorders than in other clinical conditions’. They then infer from this conclusion that it ‘raises important questions about the degree of objectivity afforded to neuropsychological tests in clinical practice and research’, but they do not expand on this generalisation. In reaching their key conclusion, McWhirter fall into the trap of ‘comparing apples with oranges', and not making reliable and valid comparisons. If we take one of the best documented functional conditions, Psychogenic Non-Epileptic Seizures (PNES), a proper comparison would be to take a group of well documented PNES patients, who only had psychogenic seizures with no lesion pathology discernible, and compare them to a group of patients who had well-documented organic seizures, with lesion pathology clearly defined. As well as matching on usual demographic variables such as age, sex and educational background, the two groups would be carefully matched for duration, frequency and severity of seizures, and for functional disability. It would then be meaningful to compare the performance of the two groups on PVT, and come to any conclusions as to whether rates are higher, lower or the same in the functional group compared to the organic group.
Whilst we have concerns about the lack of rigour in the search methodology, which resulted in an incomplete literature review which pooled data from studies of uncertain quality that may not be comparable, of more concern is that McWhirter and colleagues may lack the knowledge and expertise to interpret these data in a clinically meaningful way. The authors are dismissive of what is a still a developing PVT literature that has achieved a good deal in the last 15-20 years and resulted in an excellent consensus of the requirement to validate neuropsychological test performance with objective tests and symptom-based questionnaires. McWhirter’s et al interpretation of the findings does not provide adequate context to both the latest U.S. and the U.K. effort test / PVT interpretation guidelines, and does not reflect neuropsychological expertise or clinical neuropsychological practice. The authors do not cite the latest U.S. guidelines (Sherman et al, 2020) 3 and the U.K. guidelines are not mentioned (British Psychological Society: Professional Practice Board) 4.
A further key difficulty with the paper is that the authors report the failure rate on individual effort tests of different sensitivities without consideration of the various methodological and statistical techniques that clinical neuropsychologists use to interpret such findings. In clinical practice, a single test score is of little significance, and the authors appear to misunderstand this fundamental point in clinical neuropsychology practice. An effort test profile is obtained by the use of a combination of PVTs of different sensitivities, different cognitive domains, administered throughout the examination, subjected to statistical discrepancy analysis, often binomial probability analysis, placed in the context of positive and negative predictive power and in the context of the patients wider clinical presentation, which could include pain, fatigue, depression and anxiety and related effects on concentration. The paper by McWhirter et al. appears to show no discernible understanding of this statistical and clinical context. Current guidelines clearly identify the need to interpret the results of a failure and provide possible explanations, and it has never been a simple case of regarding pass / fail on a single effort test as diagnostic in its own right.
The authors also seem to misunderstand key concepts, including ‘profile analysis’, which is a technique to prevent misclassification PVT failure as low effort in the presence of bona fide cognitive problems, and this methodology is applicable to tests other than the Word Memory Test, including the TOMM. Failure to understand this and exclude below cut-off performance on effort tests when a ‘severe impairment profile’ is obtained will further distort the McWhirter et al findings, which are derived form a methodology that appears to fall short of the PRISMA standard and resulted in a partial review of the literature, without mention of quality criteria, no second rater and no method to resolve inter-rater discrepancies as would be expected of a well-conducted systematic review. In their Discussion section, the authors also appear to have confounded forced-choice testing, chance-level performance and intentionality.
In their review, McWhirter et al unfortunately group PVTs together and do not appear to readily distinguish between embedded measures and PVTs which have been specifically designed to detect poor cognitive effort. It is performance on the latter tests which need to be accorded greater significance, as it is those which form the basis of conclusions reached by neuropsychologists in their clinical practice when coming to a diagnosis of questionable effort.
In summary, we welcome the contribution of McWhirter et al to an important debate. However, their depiction of neuropsychology using PVTs alone, without clinical context and without methods of analyses to diagnose functional cognitive disorder or ‘malingering’, presents a ‘straw man’ argument because this does not align with what clinical neuropsychologists think or do. A clearer and more extensive review of the literature would have identified the role of PVTs and the complexity of their interpretation, and also allowed readers to have a more balanced understating of their role in clinical practice.
Professor Steven Kemp (UK)
Professor Narinder Kapur (UK)
Professor Gus Baker (UK)
Professor Martin Bunnage (UK)
Professor Liam Dorris (UK)
Dr Perry Moore (UK)
Mr Daniel Friedland (UK)
1 Schroeder R, Peck C, Buddin W. et al. (2012). The Coin-in-the-Hand test and dementia: More evidence for a screening test for neurocognitive symptom exaggeration. Cogn Behav Neurol; 25: 139-143.
2 Sieck B, Smith M, Duff K et al. (2013). Symptom validity test performance in the Huntington Disease clinic. Arch Clin Neuropsych; 28: 135-143.
3.Sherman EMS, Daniel J. Slick DK and Iverson GL (2020). Multidimensional Malingering Criteria for Neuropsychological Assessment: A 20-Year Update of the Malingered Neuropsychological Dysfunction Criteria Archives of Clinical Neuropsychology 00. 1–30.
4. Assessment of Effort in Clinical Testing of Cognitive Functioning for Adults (2009). The British Psychology Society: Professional Practice Board.