Inter-rater agreement of observable and elicitable neurological signs

Mark Thaller; Thomas Hughes

doi:10.7861/clinmedicine.14-3-264

Abstract

This paper reports on a study that aimed to assess the inter-rater agreement of observable neurological signs in the upper and lower limbs (eg inspection, gait, cerebellar tests and coordination) and elicitable signs (eg tone, strength, reflexes and sensation). Thirty patients were examined by two neurology doctors, at least one of whom was a consultant. The doctors’ findings were recorded on a standardised pro forma. Inter-rater agreement was assessed using the kappa (κ) statistic, which is chance corrected. There was significantly better agreement between the two doctors for observable than for elicitable signs (mean ± standard deviation [SD] κ, 0.70 ± 0.17 vs 0.41 ± 0.22, p = 0.002). Almost perfect agreement was seen for cerebellar signs and inspection (a combination of speed of movement, muscle bulk, wasting and tremor); substantial agreement for strength, gait and coordination; moderate agreement for tone and reflexes; and only fair agreement for sensation. The inter-rater agreement is therefore better for observable neurological signs than for elicitable signs, which may be explained by the additional skill and cooperation required to elicit rather than just observe clinical signs. These findings have implications for clinical practice, particularly in telemedicine, and highlight the need for standardisation of the neurological examination.

KEYWORDS

Introduction

A neurological consultation comprises a verifiable history, a reliable examination, appropriate investigations and their subsequent interpretation. When a specialist opinion is sought using telemedicine, the remote clinician relies on another doctor's neurological examination. Some neurological signs have to be elicited by the examining physician, eg tone, strength and sensory deficits, but some valuable signs can be seen and heard by the remote and examining physician, eg walking, speed of finger movements and maintaining the outstretched arm in a particular posture.

The inter-rater reliability of the National Institutes of Health Stroke Scale (NIHSS, Table 1), which splits motor aspects into five groups – no drift (0), drift before 10 seconds (1), falls before 10 seconds (2), no effort against gravity (3) and no movement (4) – and the traditional neurological examination has been investigated before (Table 2), but these investigations did not include a comparison of signs categorised as elicitable or observable. These studies have not analysed their data according to whether the clinical signs could be observed from the end of the bed.

View this table:

Table 1.

Inter-rater reliability of NIHSS.

View this table:

Table 2.

Inter-rater reliability of components of the traditional neurological examination.

Telemedicine has been used to provide an out-of-hours stroke thrombolysis service to hospitals in south-east Wales since April 2012. We therefore investigated the inter-rater agreement of some elicitable and observable neurological signs in the upper and lower limbs to inform an assessment of their utility in the clinical examination performed using telemedicine.

Methods

Thirty patients (mean ± standard deviation [SD] age 55 ± 15 years) recruited over a 4-week period in a routine neurology outpatient clinic gave written consent to be examined by a consultant and, in the same clinic session, one other neurology doctor (foundation year 2, core medical trainee, specialty registrar or consultant). The second examiner, blinded to the findings of the first, repeated the examination of the upper and lower limbs. Examiners were asked to record their findings immediately on a standardised pro forma (Table 3) by selecting from binary options (eg present/absent for clonus) and categorical options (eg absent, depressed, normal or brisk for reflexes and Medical Research Council grades 0–5 for strength). Clinicians did not undertake any special training or instruction in clinical examination as part of this study and were asked to examine patients in accordance with their usual clinical practice, with appropriate equipment provided.

View this table:

Table 3.

Proforma for examination of the limbs (replicated for each side).

Inter-rater agreement was assessed using the κ statistic, which makes no assumptions about which doctor is correct – only whether they agree. The κ benchmarks used in this paper were that of Landis and Koch: <0 represents poor agreement, 0–0.20 slight agreement, 0.21–0.40 fair agreement, 0.41–0.60 moderate agreement, 0.61–0.80 substantial agreement and 0.81–1.00 almost perfect agreement.¹⁵ A significant difference in agreement is present if there is no overlap in the 95% confidence intervals for the κ value. The mean κ and t-test results were used to assess the significance of the difference between grouped data. The analysis was performed using Microsoft Excel 2007 spreadsheet software.

The study was part of a medical student placement and was approved by the North Wales Research Ethics (Central & East) Proportionate Review Sub-Committee (11-WA-0311) and the Cardiff and Vale Research and Development Department (11/CMC/5212).

Results

The results are summarised in Fig 1 and Table 4. The inter-rater reliability for observable signs was better than for elicitable signs (mean ± SD κ value 0.70 ± 0.17 vs 0.41 ± 0.22, p = 0.002). We considered whether the difference between observable and elicitable signs was a consequence of the variable number of available options – for example, reflexes could be normal, brisk, reduced or absent but speed of movement could only be normal or slow. We therefore recalculated the inter-rater agreement for all data using a binary grouping – for example, reflexes could be abnormal (brisk, reduced or absent) or normal and strength could be abnormal (any grading ≤4) or normal (grade 5). The difference in the inter-rater agreement between observable and elicitable signs was still significant (mean ± SD κ value 0.76 ± 0.09 vs 0.46 ± 0.21, p = 0.014).

Fig 1.

Agreement of neurological signs. Note: κ results for chorea, fasciculation and pronator drift are zero, because they were not observed by any of the clinicians.

View this table:

Table 4.

Main components of the neurological examination of the limbs with combined data for each aspect.

Discussion

Signs that have to be elicited involve skill on the part of the examiner, the cooperation of the patient and then interpretation – for example, to test tone, the patient must be relaxed and comfortable and the examining doctor must have an understanding of the actions required to elicit the clinical features of spasticity and rigidity. Informal observation of the techniques used by different doctors in this study suggested marked variations in technique and interpretation, which may explain the poor inter-rater agreement. By comparison, it is more straightforward to observe patients at rest or when performing actions such as tapping movements of the finger and thumb to assess speed of movement or walking, which may explain the better agreement seen for these observable signs. Miller and Johnston¹⁶ found foot tapping (κ = 0.73) to be more reliable (sensitivity 86% and specificity 84%) for upper motor neurone weakness than Babinski testing (plantar reflex) (κ = 0.30) (sensitivity 35% and specificity 77%).

The previous literature (see Tables 1 and 2) shows a wide variation in the elicitable signs, with the κ statistic value ranging from 0.29 to 1.00 (mean 0.65) for strength and from 0.15 to 1.00 (0.46) for sensation. The variation in reliability of the peripheral neurological examination in the literature, as well as with the results of this study, highlights that relying on another doctor's assessment may affect diagnosis and management.

One of the concerns of clinicians providing opinions about patients they are not able to examine in person is that their clinical examination is impoverished by the lack of direct patient contact. However, this study suggests that those signs that require elicitation have poorer inter-rater reliability than ‘end-of-the-bed’ signs, which can be observed by both the attending physician and the remote physician using telemedicine equipment. The importance of being a good noticer¹⁷ is as relevant today as it ever was, and rather than compromising clinical skills, the technology of telemedicine may demand of clinicians a review of the parts of the clinical examination that are most reliable.

Conclusion

Observable neurological signs have significantly better inter-rater agreement than elicitable signs. These findings have implications for clinical practice, including telemedicine.

Acknowledgements

We would like to thank clinical colleagues in the department of neurology for their help and support. This work was first presented at the video conference All Wales Stroke Meeting (AWSM).

References

1. Anderson ER,
2. Smith B,
3. Ido M,
4. Frankel M
. Remote assessment of stroke using iPhone 4. J Stroke Cerebrovascular Dis 2013;22:340–4.doi:10.1016/j.jstrokecerebrovasdis.2011.09.013
OpenUrl CrossRef
1. Demaerschalk BM,
2. Vegunta S,
3. Vargas BB,
4. et al.
Reliability of real-time video smartphone for assessing National Institutes of Health Stroke Scale scores in acute stroke patients. Stroke 2012;43:3271–7.doi:10.1161/STROKEAHA.112.669150
OpenUrl Abstract/FREE Full Text
1. Gonzalez MA,
2. Hanna N,
3. Rodrigo ME,
4. et al.
Reliability of pre–hospital real-time cellular video phone in assessing the simplified National Institutes of Health Stroke Scale in patients with acute stroke: a novel telemedicine technology. Stroke 2011;42:1522–7.doi:10.1161/STROKEAHA.110.600296
OpenUrl Abstract/FREE Full Text
1. Meyer BC,
2. Lyden PD,
3. Al-Khoury L,
4. et al.
Prospective reliability of the STRokE DOC wireless/site independent telemedicine system. Neurology 2005;64:1058–60.doi:10.1212/01.WNL.0000154601.26653.E7
OpenUrl Abstract/FREE Full Text
1. Handschu R,
2. Littmann R,
3. Reulbach U,
4. et al.
Telemedicine in -emergency evaluation of acute stroke: interrater agreement in remote video examination with a novel multimedia system. Stroke 2003;34:2842–6.doi:10.1161/01.STR.0000102043.70312.E9
OpenUrl Abstract/FREE Full Text
1. Meyer BC,
2. Hemmen TM,
3. Jackson CM,
4. Lyden PD
. Modified National Institutes of Health Stroke Scale for use in stroke clinical trials: prospective reliability and validity. Stroke 2002;33:1261–6.doi:10.1161/01.STR.0000015625.87603.A7
OpenUrl Abstract/FREE Full Text
1. Shafqat S,
2. Kvedar JC,
3. Guanci MM,
4. et al.
Role for telemedicine in acute stroke: feasibility and reliability of remote administration of the NIH stroke scale. Stroke 1999;30:2141–5.doi:10.1161/01.STR.30.10.2141
OpenUrl Abstract/FREE Full Text
1. Brott T,
2. Adams HP Jr.,
3. Olinger CP,
4. et al.
Measurements of acute cerebral infarction: a clinical examination scale. Stroke 1989;20:864–70.doi:10.1161/01.STR.20.7.864
OpenUrl Abstract/FREE Full Text
1. Goldstein L,
2. Bertels C,
3. Davis J
. Interrater reliability of the NIH stroke scale. Arch Neurol 1989;46:660–2.doi:10.1001/archneur.1989.00520420080026
OpenUrl CrossRef PubMed
1. Carswell C,
2. Rañopa M,
3. Pal S,
4. et al.
Video rating in neurodegenerative disease clinical trials: the experience of PRION-1. Dement Geriatr Cogn Dis Extra 2012;2:286–97.doi:10.1159/000339730
OpenUrl CrossRef PubMed
1. Hand P,
2. Haisma JA,
3. Kwan J,
4. et al.
Interobserver agreement for the bedside clinical assessment of suspected stroke. Stroke 2006;37:776–80.doi:10.1161/01.STR.0000204042.41695.a1
OpenUrl Abstract/FREE Full Text
1. Jepsen J,
2. Laursen LH,
3. Hagert CG,
4. et al.
Diagnostic accuracy of the neurological upper limb examination I: Inter-rater reproducibility of selected findings and patterns. BMC Neurology 2006;6:8.doi:10.1186/1471-2377-6-8
OpenUrl CrossRef PubMed
1. Lindley RI,
2. Warlow CP,
3. Wardlaw JM,
4. et al.
Interobserver reliability of a clinical classification of acute cerebral infarction. Stroke 1993;24:1801–4.doi:10.1161/01.STR.24.12.1801
OpenUrl Abstract/FREE Full Text
1. Shinar D,
2. Gross CR,
3. Mohr JP,
4. et al.
Interobserver variability in the assessment of neurologic history and examination in the Stroke Data Bank. Arch Neurol 1985;42:557–65.doi:10.1001/archneur.1985.04060060059010
OpenUrl CrossRef PubMed
↵
1. Landis J,
2. Koch G
. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.doi:10.2307/2529310
OpenUrl CrossRef PubMed
↵
1. Miller T,
2. Johnston SC
. Should the Babinski sign be part of the routine neurologic examination? Neurology 2005;65:1165–8.doi:10.1212/01.wnl.0000180608.76190.10
OpenUrl Abstract/FREE Full Text
↵
1. Asher R
. Clinical sense. BMJ 1960;1:985–93.doi:10.1136/bmj.1.5178.985
OpenUrl FREE Full Text

[1] Anderson ER,
Smith B,
Ido M,
Frankel M
. Remote assessment of stroke using iPhone 4. J Stroke Cerebrovascular Dis 2013;22:340–4.doi:10.1016/j.jstrokecerebrovasdis.2011.09.013
OpenUrl CrossRef

[2] Anderson ER,

[3] Smith B,

[4] Ido M,

[5] Frankel M

[6] Demaerschalk BM,
Vegunta S,
Vargas BB,
et al.
Reliability of real-time video smartphone for assessing National Institutes of Health Stroke Scale scores in acute stroke patients. Stroke 2012;43:3271–7.doi:10.1161/STROKEAHA.112.669150
OpenUrl Abstract/FREE Full Text

[7] Demaerschalk BM,

[8] Vegunta S,

[9] Vargas BB,

[10] et al.

[11] Gonzalez MA,
Hanna N,
Rodrigo ME,
et al.
Reliability of pre–hospital real-time cellular video phone in assessing the simplified National Institutes of Health Stroke Scale in patients with acute stroke: a novel telemedicine technology. Stroke 2011;42:1522–7.doi:10.1161/STROKEAHA.110.600296
OpenUrl Abstract/FREE Full Text

[12] Gonzalez MA,

[13] Hanna N,

[14] Rodrigo ME,

[15] et al.

[16] Meyer BC,
Lyden PD,
Al-Khoury L,
et al.
Prospective reliability of the STRokE DOC wireless/site independent telemedicine system. Neurology 2005;64:1058–60.doi:10.1212/01.WNL.0000154601.26653.E7
OpenUrl Abstract/FREE Full Text

[17] Meyer BC,

[18] Lyden PD,

[19] Al-Khoury L,

[20] et al.

[21] Handschu R,
Littmann R,
Reulbach U,
et al.
Telemedicine in -emergency evaluation of acute stroke: interrater agreement in remote video examination with a novel multimedia system. Stroke 2003;34:2842–6.doi:10.1161/01.STR.0000102043.70312.E9
OpenUrl Abstract/FREE Full Text

[22] Handschu R,

[23] Littmann R,

[24] Reulbach U,

[25] et al.

[26] Meyer BC,
Hemmen TM,
Jackson CM,
Lyden PD
. Modified National Institutes of Health Stroke Scale for use in stroke clinical trials: prospective reliability and validity. Stroke 2002;33:1261–6.doi:10.1161/01.STR.0000015625.87603.A7
OpenUrl Abstract/FREE Full Text

[27] Meyer BC,

[28] Hemmen TM,

[29] Jackson CM,

[30] Lyden PD

[31] Shafqat S,
Kvedar JC,
Guanci MM,
et al.
Role for telemedicine in acute stroke: feasibility and reliability of remote administration of the NIH stroke scale. Stroke 1999;30:2141–5.doi:10.1161/01.STR.30.10.2141
OpenUrl Abstract/FREE Full Text

[32] Shafqat S,

[33] Kvedar JC,

[34] Guanci MM,

[35] et al.

[36] Brott T,
Adams HP Jr.,
Olinger CP,
et al.
Measurements of acute cerebral infarction: a clinical examination scale. Stroke 1989;20:864–70.doi:10.1161/01.STR.20.7.864
OpenUrl Abstract/FREE Full Text

[37] Brott T,

[38] Adams HP Jr.,

[39] Olinger CP,

[40] et al.

[41] Goldstein L,
Bertels C,
Davis J
. Interrater reliability of the NIH stroke scale. Arch Neurol 1989;46:660–2.doi:10.1001/archneur.1989.00520420080026
OpenUrl CrossRef PubMed

[42] Goldstein L,

[43] Bertels C,

[44] Davis J

[45] Carswell C,
Rañopa M,
Pal S,
et al.
Video rating in neurodegenerative disease clinical trials: the experience of PRION-1. Dement Geriatr Cogn Dis Extra 2012;2:286–97.doi:10.1159/000339730
OpenUrl CrossRef PubMed

[46] Carswell C,

[47] Rañopa M,

[48] Pal S,

[49] et al.

[50] Hand P,
Haisma JA,
Kwan J,
et al.
Interobserver agreement for the bedside clinical assessment of suspected stroke. Stroke 2006;37:776–80.doi:10.1161/01.STR.0000204042.41695.a1
OpenUrl Abstract/FREE Full Text

[51] Hand P,

[52] Haisma JA,

[53] Kwan J,

[54] et al.

[55] Jepsen J,
Laursen LH,
Hagert CG,
et al.
Diagnostic accuracy of the neurological upper limb examination I: Inter-rater reproducibility of selected findings and patterns. BMC Neurology 2006;6:8.doi:10.1186/1471-2377-6-8
OpenUrl CrossRef PubMed

[56] Jepsen J,

[57] Laursen LH,

[58] Hagert CG,

[59] et al.

[60] Lindley RI,
Warlow CP,
Wardlaw JM,
et al.
Interobserver reliability of a clinical classification of acute cerebral infarction. Stroke 1993;24:1801–4.doi:10.1161/01.STR.24.12.1801
OpenUrl Abstract/FREE Full Text

[61] Lindley RI,

[62] Warlow CP,

[63] Wardlaw JM,

[64] et al.

[65] Shinar D,
Gross CR,
Mohr JP,
et al.
Interobserver variability in the assessment of neurologic history and examination in the Stroke Data Bank. Arch Neurol 1985;42:557–65.doi:10.1001/archneur.1985.04060060059010
OpenUrl CrossRef PubMed

[66] Shinar D,

[67] Gross CR,

[68] Mohr JP,

[69] et al.

[70] ↵
Landis J,
Koch G
. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.doi:10.2307/2529310
OpenUrl CrossRef PubMed

[71] Landis J,

[72] Koch G

[73] ↵
Miller T,
Johnston SC
. Should the Babinski sign be part of the routine neurologic examination? Neurology 2005;65:1165–8.doi:10.1212/01.wnl.0000180608.76190.10
OpenUrl Abstract/FREE Full Text

[74] Miller T,

[75] Johnston SC

[76] ↵
Asher R
. Clinical sense. BMJ 1960;1:985–93.doi:10.1136/bmj.1.5178.985
OpenUrl FREE Full Text

[77] Asher R

Main menu

Clinical Medicine Journal

User menu

Search

Clinical Medicine Journal

Inter-rater agreement of observable and elicitable neurological signs

Abstract

Introduction

Methods

Results

Discussion

Conclusion

Acknowledgements

References

Article Tools

Citation Manager Formats

Related Articles

Cited By...

More in this TOC Section

Similar Articles

FAQs

Navigate this Journal

Related Links

Main menu

Clinical Medicine Journal

User menu

Search

Clinical Medicine Journal

Inter-rater agreement of observable and elicitable neurological signs

Abstract

Introduction

Methods

Results

Discussion

Conclusion

Acknowledgements

References

Article Tools

Citation Manager Formats

Jump to section

Related Articles

Cited By...

More in this TOC Section

Similar Articles

FAQs

Navigate this Journal

Related Links

Follow Us: