Article Text

How reliable is repeated testing for hemispatial neglect? Implications for clinical follow-up and treatment trials
  1. Björn Machner1,2,
  2. Yee-Haur Mah1,
  3. Nikos Gorgoraptis1,
  4. Masud Husain1
  1. 1National Hospital for Neurology and Neurosurgery, UCL Institute of Cognitive Neuroscience and Institute of Neurology, Queen Square, London, UK
  2. 2Department of Neurology, University of Lübeck, Lübeck, Germany
  1. Correspondence to Dr Björn Machner, Department of Neurology, University of Lübeck, Ratzeburger Allee 160, D-23538 Lübeck, Germany; bjoern.machner{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Patients with hemispatial neglect following right hemisphere brain damage fail to spontaneously orient towards or respond to contralesional stimuli.1 The diagnosis and longitudinal assessment of the syndrome is not always straightforward. This is mainly due to two reasons: the heterogeneity of the syndrome and inter-individual differences in the time course of recovery from the disorder.

The neglect syndrome affects various cognitive components across patients, and one patient may show neglect on certain tasks but not on others.1 ,2 Because there is no single test able to detect neglect in all patients, a battery of several paper-and-pencil tests is usually required.1 ,3 However, little is known about their use as a tool for longitudinal assessments. This is of high clinical importance as repeat assessments are necessary to monitor changes in neglect severity related to spontaneous remission or a specific treatment. If a test per se is not ‘stable’, the variation in test results over repeated sessions may simply reflect low test–retest reliability and not the actual change of the underlying disorder.

We therefore investigated the test–retest reliability of three of the most commonly used paper-and-pencil tests for hemispatial neglect over several daily test repetitions in chronic neglect patients.



Fifteen patients (mean age 57.4 years ±14.2 SD) with left hemispatial neglect following right hemisphere stroke (median time 7.6 months since stroke) participated in this study (see online supplementary table 1 and online supplementary figure 1 for details).

Hemispatial neglect was diagnosed if patients showed at least two more omissions on the left than on the right in standard cancellation tasks (Bells and Mesulam).3

Patients provided written informed consent before participating in this study which was approved by the National Research Ethics Service.

Testing procedure

Each patient was tested daily on five consecutive days, always at the same time of the day, on the following three paper-and-pencil tests.

Line bisection

Three horizontal black lines of 17 cm length should be bisected, each being presented on a separate white sheet of paper (A4 size). Total deviation from the actual centre of the lines (mm) was measured and a mean was calculated for each session.

Bells test

Patients should circle bells (n=35) among 280 distractors presented on an A3 sized sheet of paper. We measured the total number of cancellations and calculated a spatial lateralisation index: number of left-sided subtracted from the number of right-sided cancellations divided by total number of cancellations.

Mesulam and Weintraub's symbol cancellation test

Patients were asked to circle targets (n=60, >300 distractors) on an A3 sized sheet of paper. Again, the total number of cancellations and the lateralisation index were analysed.


To assess test–retest reliability, traditional correlation coefficients such as Pearson's r are considered inappropriate as they ignore the degree of absolute agreement between repeated testing.4 Using SPSS V.15.0 we calculated the intraclass correlation coefficient (ICC), a measure that accounts for the relative and absolute reliability.5 The ICC produces a reliability index between 0 and 1, with <0.7 indicating poor and >0.7 good reliability.4

The ICC was high for ‘total number of cancellations’ (0.84 and 0.83 for Bells and Mesulam, respectively) and ‘lateralisation index’ (0.83 and 0.82), indicating high test–retest reliability for the cancellation tasks (table 1, online supplementary figure 2).

Table 1

Mean scores, test–retest reliability and absolute agreement for the different tests in 15 neglect patients over five sessions

In contrast, ICC was far lower for the line bisection task (0.47). It also did not increase when the analysis was based on dichotomous outcomes (normal vs pathological ‘neglect-like’ deviation, with a cut-off point of +5.5 mm,3 ICC=0.41, 95% CI 0.19 to 0.69).

As shown in table 1, we also analysed the SEM and the 95% CI for test–retest agreement (CI, twice the SEM). These useful ‘score bands’ give an absolute range of test scores in which the true score of a subject on retesting will most likely fall into.

SEM was calculated as SEM=s(1ICC), where s is the SD for the test, derived from patients' performances on all sessions.


In this study, we investigated the test–retest reliability of three major neglect tests. In contrast to previous studies with only one test repetition following a brief time interval,4 we chose a more clinically realistic scenario with repeated assessments over consecutive days. We examined only chronic neglect patients; thus, variability in test performance could be related to the characteristics of the measurement as opposed to fluctuations of the underlying disorder.

We found that both cancellation tasks (Bells and Mesulam) were very stable on retesting. In contrast, neglect patients' performance on line bisection fluctuated considerably over sessions, demonstrating low test–retest reliability.

Our study further provides ‘error bands’ in absolute test units that are clinically more useful than reliability coefficients. In the Bells test, for instance, a difference in test performance between test and retest of more than five cancellations is most likely due to a change in neglect severity and not test variability. In contrast, a reduction of about 15 mm deviation in line bisection does not necessarily mean neglect improvement but can be due to test variability.

Notably, test–retest reliability of the line bisection task also did not increase when the analysis was based on a cruder dichotomous score (instead of absolute millimetres), ruling out different granularities of the measures as an explanation. Instead, beside learning and fatigue effects,4 one reason for the low test–retest reliability of the line bisection task may be its lower sensitivity and specificity for hemispatial neglect when compared with cancellation tasks.2 ,3 Line bisection can also be influenced by other conditions sometimes associated with spatial neglect (eg, visual field defects). It may explain the double dissociation among individual patients with normal performance on one and pathological performance on the other test.2

Regardless of the cause, a neglect test like the line bisection task with low test–retest reliability is certainly not a good tool for the longitudinal assessment of a patient's neglect severity in treatment trials or during rehabilitation. Established cancellation tasks, in contrast, can be recommended as very reliable measures for the longitudinal assessment of neglect.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Funding This work was supported by The Wellcome Trust. BM was supported by a fellowship of the European Neurological Society (ENS).

  • Competing interests None.

  • Patient consent Obtained.

  • Ethics approval Ethics approval was provided by the National Research Ethics Service (NRES).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement The authors are open to discussion regarding data sharing.