Inter-rater agreement of observable and elicitable neurological signs
Abstract
This paper reports on a study that aimed to assess the inter-rater agreement of observable neurological signs in the upper and lower limbs (eg inspection, gait, cerebellar tests and coordination) and elicitable signs (eg tone, strength, reflexes and sensation). Thirty patients were examined by two neurology doctors, at least one of whom was a consultant. The doctors’ findings were recorded on a standardised pro forma. Inter-rater agreement was assessed using the kappa (κ) statistic, which is chance corrected. There was significantly better agreement between the two doctors for observable than for elicitable signs (mean ± standard deviation [SD] κ, 0.70 ± 0.17 vs 0.41 ± 0.22, p = 0.002). Almost perfect agreement was seen for cerebellar signs and inspection (a combination of speed of movement, muscle bulk, wasting and tremor); substantial agreement for strength, gait and coordination; moderate agreement for tone and reflexes; and only fair agreement for sensation. The inter-rater agreement is therefore better for observable neurological signs than for elicitable signs, which may be explained by the additional skill and cooperation required to elicit rather than just observe clinical signs. These findings have implications for clinical practice, particularly in telemedicine, and highlight the need for standardisation of the neurological examination.
Introduction
A neurological consultation comprises a verifiable history, a reliable examination, appropriate investigations and their subsequent interpretation. When a specialist opinion is sought using telemedicine, the remote clinician relies on another doctor's neurological examination. Some neurological signs have to be elicited by the examining physician, eg tone, strength and sensory deficits, but some valuable signs can be seen and heard by the remote and examining physician, eg walking, speed of finger movements and maintaining the outstretched arm in a particular posture.
The inter-rater reliability of the National Institutes of Health Stroke Scale (NIHSS, Table 1), which splits motor aspects into five groups – no drift (0), drift before 10 seconds (1), falls before 10 seconds (2), no effort against gravity (3) and no movement (4) – and the traditional neurological examination has been investigated before (Table 2), but these investigations did not include a comparison of signs categorised as elicitable or observable. These studies have not analysed their data according to whether the clinical signs could be observed from the end of the bed.
Telemedicine has been used to provide an out-of-hours stroke thrombolysis service to hospitals in south-east Wales since April 2012. We therefore investigated the inter-rater agreement of some elicitable and observable neurological signs in the upper and lower limbs to inform an assessment of their utility in the clinical examination performed using telemedicine.
Methods
Thirty patients (mean ± standard deviation [SD] age 55 ± 15 years) recruited over a 4-week period in a routine neurology outpatient clinic gave written consent to be examined by a consultant and, in the same clinic session, one other neurology doctor (foundation year 2, core medical trainee, specialty registrar or consultant). The second examiner, blinded to the findings of the first, repeated the examination of the upper and lower limbs. Examiners were asked to record their findings immediately on a standardised pro forma (Table 3) by selecting from binary options (eg present/absent for clonus) and categorical options (eg absent, depressed, normal or brisk for reflexes and Medical Research Council grades 0–5 for strength). Clinicians did not undertake any special training or instruction in clinical examination as part of this study and were asked to examine patients in accordance with their usual clinical practice, with appropriate equipment provided.
Inter-rater agreement was assessed using the κ statistic, which makes no assumptions about which doctor is correct – only whether they agree. The κ benchmarks used in this paper were that of Landis and Koch: <0 represents poor agreement, 0–0.20 slight agreement, 0.21–0.40 fair agreement, 0.41–0.60 moderate agreement, 0.61–0.80 substantial agreement and 0.81–1.00 almost perfect agreement.15 A significant difference in agreement is present if there is no overlap in the 95% confidence intervals for the κ value. The mean κ and t-test results were used to assess the significance of the difference between grouped data. The analysis was performed using Microsoft Excel 2007 spreadsheet software.
The study was part of a medical student placement and was approved by the North Wales Research Ethics (Central & East) Proportionate Review Sub-Committee (11-WA-0311) and the Cardiff and Vale Research and Development Department (11/CMC/5212).
Results
The results are summarised in Fig 1 and Table 4. The inter-rater reliability for observable signs was better than for elicitable signs (mean ± SD κ value 0.70 ± 0.17 vs 0.41 ± 0.22, p = 0.002). We considered whether the difference between observable and elicitable signs was a consequence of the variable number of available options – for example, reflexes could be normal, brisk, reduced or absent but speed of movement could only be normal or slow. We therefore recalculated the inter-rater agreement for all data using a binary grouping – for example, reflexes could be abnormal (brisk, reduced or absent) or normal and strength could be abnormal (any grading ≤4) or normal (grade 5). The difference in the inter-rater agreement between observable and elicitable signs was still significant (mean ± SD κ value 0.76 ± 0.09 vs 0.46 ± 0.21, p = 0.014).
Discussion
Signs that have to be elicited involve skill on the part of the examiner, the cooperation of the patient and then interpretation – for example, to test tone, the patient must be relaxed and comfortable and the examining doctor must have an understanding of the actions required to elicit the clinical features of spasticity and rigidity. Informal observation of the techniques used by different doctors in this study suggested marked variations in technique and interpretation, which may explain the poor inter-rater agreement. By comparison, it is more straightforward to observe patients at rest or when performing actions such as tapping movements of the finger and thumb to assess speed of movement or walking, which may explain the better agreement seen for these observable signs. Miller and Johnston16 found foot tapping (κ = 0.73) to be more reliable (sensitivity 86% and specificity 84%) for upper motor neurone weakness than Babinski testing (plantar reflex) (κ = 0.30) (sensitivity 35% and specificity 77%).
The previous literature (see Tables 1 and 2) shows a wide variation in the elicitable signs, with the κ statistic value ranging from 0.29 to 1.00 (mean 0.65) for strength and from 0.15 to 1.00 (0.46) for sensation. The variation in reliability of the peripheral neurological examination in the literature, as well as with the results of this study, highlights that relying on another doctor's assessment may affect diagnosis and management.
One of the concerns of clinicians providing opinions about patients they are not able to examine in person is that their clinical examination is impoverished by the lack of direct patient contact. However, this study suggests that those signs that require elicitation have poorer inter-rater reliability than ‘end-of-the-bed’ signs, which can be observed by both the attending physician and the remote physician using telemedicine equipment. The importance of being a good noticer17 is as relevant today as it ever was, and rather than compromising clinical skills, the technology of telemedicine may demand of clinicians a review of the parts of the clinical examination that are most reliable.
Conclusion
Observable neurological signs have significantly better inter-rater agreement than elicitable signs. These findings have implications for clinical practice, including telemedicine.
Acknowledgements
We would like to thank clinical colleagues in the department of neurology for their help and support. This work was first presented at the video conference All Wales Stroke Meeting (AWSM).
- © 2014 Royal College of Physicians
References
- Demaerschalk BM,
- Vegunta S,
- Vargas BB,
- et al.
- Gonzalez MA,
- Hanna N,
- Rodrigo ME,
- et al.
- Meyer BC,
- Lyden PD,
- Al-Khoury L,
- et al.
- Handschu R,
- Littmann R,
- Reulbach U,
- et al.
- Meyer BC,
- Hemmen TM,
- Jackson CM,
- Lyden PD
- Shafqat S,
- Kvedar JC,
- Guanci MM,
- et al.
- Brott T,
- Adams HP Jr.,
- Olinger CP,
- et al.
- Hand P,
- Haisma JA,
- Kwan J,
- et al.
- Lindley RI,
- Warlow CP,
- Wardlaw JM,
- et al.
- ↵
- ↵
- Miller T,
- Johnston SC
- ↵
- Asher R
Article Tools
Citation Manager Formats
Jump to section
Related Articles
- No related articles found.