Article Text

Download PDFPDF

Reproducibility and diagnostic accuracy of substantia nigra sonography for the diagnosis of Parkinson's disease
  1. Simone van de Loo1,
  2. Uwe Walter2,
  3. Stefanie Behnke3,
  4. Johann Hagenah4,
  5. Matthias Lorenz1,
  6. Matthias Sitzer1,
  7. Rüdiger Hilker1,
  8. Daniela Berg5
  1. 1Department of Neurology, University Hospital, Goethe-University, Frankfurt am Main, Germany
  2. 2Department of Neurology, University of Rostock, Rostock, Germany
  3. 3Department of Neurology, University of the Saarland, Homburg/Saar, Germany
  4. 4Department of Neurology, University of Lübeck, Lübeck, Germany
  5. 5Hertie Institute for Clinical Brain Research and Institute for Medical Genetics, University of Tübingen, Tübingen, Germany
  1. Correspondence to Dr Ruediger Hilker, Department of Neurology, Goethe University, Schleusenweg 2-16 60528, Frankfurt am Main, Germany; hilker{at}


Objective Transcranial sonography (TCS) shows characteristic hyperechogenicity of the substantia nigra (SN) in patients with Parkinson's disease (PD). Although this feature is well established, sufficient observer reliability and diagnostic accuracy are prerequisites for advancements of this method.

Methods The authors investigated both aspects in a cross-sectional study with four blinded TCS raters in 22 PD patients and 10 healthy controls.

Results As expected, the authors found significant bilateral SN hyperechogenicity in PD patients. Quantitative computerised SN planimetry had a substantial intra- (intraclass correlation coefficient (ICC) 0.97 and 0.93 respectively for both hemispheres) and inter-rater reliability (ICC 0.84 and 0.89), while visual semiquantitative echogenicity grading of the SN revealed a moderate intrarater (weighted kappa 0.80 ipsilateral and 0.74 contralateral) and slight (0.33) to fair (0.51) inter-rater reliability only. Diagnostic accuracy measured as the area under the curve of receiver-operating characteristics plots was highest in TCS of the SN opposite the clinically most affected body side (planimetry 0.821, echogenicity grading 0.792) with a hyperechogenic area of 0.24 cm2 as the optimum cut-off value for the differentiation between PD and controls (sensitivity 79%, specificity 81%).

Conclusions The data demonstrate that the observer variability of SN planimetry is low in the hands of experienced investigators. This approach also offers adequate diagnostic accuracy. The authors conclude that reliable SN TCS data on PD can be achieved in clinical routine and multicentre trials when standardised analysis protocols and certain quality criteria of brain parenchyma sonography are met.

  • Parkinson's disease
  • sonography
  • substantia nigra
  • image analysis
  • ultrasound
View Full Text

Statistics from


Transcranial B-mode sonography (TCS) of the substantia nigra (SN) is quick, safe, non-invasive and easily applicable in patients with Parkinson's disease (PD).1 2 TCS detects the reflection of ultrasound waves, which depends among other factors on their propagation velocity in brain tissue and on tissue impedance. The characteristic finding in PD patients is a hyperechogenic SN with an enlarged area of bright ultrasound reflexes in the midbrain,2 which presumably results from an increased tissue iron content.3–6 The quantification of the SN signal is usually carried out using planimetric measurements of the echogenic area or semiquantitative visual grading.1 7

Although a large body of evidence for the characteristic SN hyperechogenicity in PD is available, the widespread use of brain parenchyma sonography in routine diagnostics is still limited due to the frequent belief that ultrasound findings are highly subjective, examiner-dependent and of uncertain diagnostic value for these reasons. In fact, several factors are suitable to impair the inter- and intraobserver reliability of TCS, of which different experience in the evaluation of SN ultrasound images, technical factors such as image optimisation by means of the ultrasonic device settings and biological conditions such as the permeability of the preauricular acoustic bone window are the most important.1 8 Aware of these debated factors, we performed this examiner-blinded cross-sectional study on the reproducibility and diagnostic accuracy of SN TCS with the aim to provide supportive arguments for the routine application of this method in the diagnostic work-up of PD.

Materials and methods

Study subjects and design

After obtaining written informed consent and permission of the local ethical committee, a total of 32 individuals were included in the study: 22 patients with the diagnosis of PD according to the UK Brain Bank criteria (14 males, eight females, mean age 65.4±10.5 years, disease duration 8.0±5.5 years, UPDRS motor score 18.4±8.0; 12 akinetic-rigid, nine equivalence, one tremor-dominant PD type, distribution of HY stages: I: 5, II: 10; III: 7; levodopa equivalence dose: 454±168 mg) and 10 healthy age-matched controls (three males, seven females, mean age 63.5±8.2 years). PD patients were consecutively recruited from the Movement Disorder outpatient clinic of the Department of Neurology of Goethe University Hospital Frankfurt/Main (Germany). Healthy controls were patient spouses or suitable employees of the clinic. All controls had a normal neurological status and were screened for the absence of parkinsonian motor signs by an experienced movement disorders specialist (RH). All individuals were screened for a sufficient preauricular bone window before study inclusion.

Four experienced neurologists with longstanding experience in the application of SN TCS in the university hospital setting served as study raters (Rostock: UW, Homburg/Saar: SB, Lübeck: JH, Tübingen: DB). The study took place on two consecutive days in the Department of Neurology of Goethe University Frankfurt/Main (Germany). Each day, 16 participants were investigated twice by each rater, resulting in a total of 128 scans per day and 256 scans over the entire study period. The study participants and raters were consecutively assigned to four separate shaded examination rooms in random order. Periodic changes of subjects and raters between the examination rooms were controlled and synchronised. For the best possible rater blinding, the study subjects always entered the examination room prior to investigators and awaited the TCS scan laying in the supine position on the examination couch completely hidden for the later arriving sonographer by a large, non-transparent drapery. The investigators could not see any part of the study participants' body which might have shown parkinsonian signs (eg, face, trunk, limbs). Examiners and study participants were not allowed to speak with each other. During TCS, the examiners were assisted by one support person who documented the data and ensured that the appointed examination settings were maintained. Each scan had to be completed within a maximum of 10 min. The raters did not have access to the results from any other TCS session. The minimum time span between the first and the second investigation of one study participant by the same rater was about 1 h.

Transcranial brain parenchyma sonography (TCS)

SN TCS was performed using four machines from the Sonoline Antares system (Siemens AG, Munich, Germany) identical in construction and preset with the same standard parameters (penetration depth 16 cm, dynamic range 55 dB, medium contour amplification). Image brightness and time-gain compensation were individually adapted as needed. In each study participant, TCS was performed trough the preauricular acoustic bone window according to a standard approach1 using a 2.5 MHz transducer to visualise the midbrain in B-mode as the characteristic butterfly-shaped structure in the axial scanning planes (figure 1). At first, the quality of the bone window was rated as moderate (grade 1) or good (grade 2). Afterwards, the SN on the same side as the TCS probe was identified, and the patchy or band-shaped bright echogenic area was outlined after freezing and magnification of the image (SN planimetry in cm2). Moreover, SN echogenicity was graded semiquantitatively by the raters, considering the size and the contrast of the echogenic area related to the surrounding midbrain tissue as follows: grade I=normal, grade II=moderate and grade III=markedly hyperechogenic. The SN findings were allocated to the body side with the most prominent clinical motor symptoms as ipsilateral (same side) and contralateral (opposite side). Since there is no affected side in healthy controls, we defined the left hemisphere as the ipsilateral side in healthy subjects a priori. To avoid any misunderstanding, it is emphasised that the terms ipsi- and contralateral in this paper always refer to the body side with predominant PD motor symptoms and do not apply to the hemispheres in relation to the TCS probe. After tilting the ultrasound probe about 10–20° upwards, the transverse diameter of the third ventricle was additionally determined, since it is a very accessible comparative parameter for reliability measurements.

Figure 1

Transaxial midbrain B-mode sonography (Sonoline Antares, Siemens AG, Germany) of the substantia nigra (SN) measured twice by four raters in the same Parkinson's disease patient with good transtemporal bone window quality (first scan: upper row; second scan: lower row). In each case, the SN was encircled for computerised measurement (SN planimetry). The individual planimetry values were (first scan/second scan): rater 1: 0.26/0.32; rater 2: 0.26/0.29; rater 3: 0.28/0.28; rater 4: 0.28/0.27.

Statistical analysis

All statistical analyses were performed with SAS (Version 9.1, SAS Institute, Cary, North Carolina). To account for a potential bias of the data by bone window quality, most statistical analyses were repeated after eliminating measurements through an examiner-rated poor bone window. At first, standard descriptive statistics were applied, and the normal distribution of continuous parameters was assessed with QQ plots. For univariate between-group comparisons, we used the Fisher exact test for binary variables, χ2 homogeneity test for nominal or ordinal variables with more than two values, and the Student t test for continuous variables. In multivariate analyses, models with generalised estimating equations (GEE) accounted for the interdependence of measurements. Subjects were set as identifier for repeated measurements with an independent correlation structure. The preconditions of the multivariate models were assessed with standard model diagnostics.

The inter- and intraobserver agreement of TCS parameters was calculated with weighted κ values (Cicchetti-Allison type weights) for ordinal variables and intraclass correlation coefficients (ICC) for continuous variables. The ICC is a measure of the proportion of variance that is attributable to the objects of measurement and has emerged as a widely accepted reliability index.9 Due to missing values, the ICCs were calculated manually from the mean square estimates of generalised linear models accounting for the repeated measurement correlation structure. For the measurement of intraobserver reliability, we used ICC(3;k) according to Shrout and Fleiss (identical to Cronbach α) to model repeated measurements as fixed effects. For the measurement of inter-observer reliability, we used ICC(2;k) to model the raters as random effects. According to a proposal by Shrout, the following ranges of reliability values were classified: 0.00–0.10, virtually none; 0.11–0.40, slight; 0.41–0.60, fair; 0.61–0.80, moderate; 0.81–1.00, substantial.10

We calculated the sensitivity and specificity of TCS findings for the diagnosis of PD including testing for independence with the Fisher exact test. The discriminatory power of SN TCS accounting for both sensitivity and specificity was expressed as the area under the receiver-operating characteristic (ROC) curve (ROC-AUC). The Youden index (J=sensitivity+specificity−1) provided a single numerical value with a maximum of 1.00 as a global measure of test accuracy.11 12


Bone window quality

In 142 out of 256 measurements (56%), the examiners rated a good bone window quality (grade 2). A poor but sufficient bone window quality at least on one side (median grading 1 over all raters) was detected in 8/22 PD patients (36%) and in 2/10 controls (20%). A poor bone window quality was significantly more frequent in study participants older than 68 years (p=0.004, unpaired t test), and gender had no significant influence (41% in males vs 33% in females, p=0.7, Fisher exact test).

Between-group comparisons: SN echogenicity and planimetry

A markedly increased echogenicity (median grading III over all raters) was detected ipsi- and contralateral to the clinically most affected body side in 16/22 PD patients (73%), which was significantly more frequent than the rate of median SN echogenicity grade III over all raters in 2/10 control subjects (20%) (p=0.02 ipsilateral and p=0.009 contralateral, χ2 test). In a multivariate model adjusted for age, gender and rater, the presence of SN echogenicity grade III versus I was related to a significantly higher probability for PD with an OR of 6.30 (95% CI 2.67 to 14.84; p<0.0001). In contrast, echogenicity grade II versus I was not significant (OR 1.76, 95% CI 0.49 to 6.27, p=0.4). SN planimetry revealed significantly higher values in PD patients compared with controls with 0.26±0.05 cm2 versus 0.19±0.08 cm2 ipsilateral (p=0.01, unpaired t test) and 0.27±0.05 cm2 versus 0.19±0.06 cm2 contralateral (p=0.0002, unpaired t test).

Reproducibility of SN findings

Considering all measurements, the intraobserver reliability of SN planimetry ranged rater-dependent from moderate (ICC 0.75) to substantial (ICC 0.95), which was slightly below the reliability of the diameter assessments of the third ventricle (table 1).

Table 1

Intraobserver reliability of substantia nigra (SN) planimetry, SN echogenicity grading and diameter measurement of the third ventricle presented separately for each rater and averaged for all investigators

The ICC averaged over all raters amounted to 0.97 (ipsilateral) and 0.93 (contralateral). Similar ICC values were found in the subgroup of measurements through a good bone window. The weighted κ values of echogenicity grading ranged investigator-dependent from fair (0.54) to substantial (0.83) in all measurements with a moderate congruency over all raters of 0.69 (ipsilateral) and 0.67 (contralateral). In measurements through a good bone window, the weighted κ values were between 0.53 and 0.94 with a moderate congruency over all raters of 0.80 (ipsilateral) and 0.74 (contralateral). The interobserver ICC values of SN planimetry were substantial and reached 0.84 ipsi- and 0.89 contralateral (0.86 and 0.87 for good bone windows respectively). The interobserver weighted κ values of SN echogenicity grading ranged from slight (0.33) to fair (0.51) in all measurements and from 0.35 to 0.49 in scans with a good bone window (table 2).

Table 2

Interobserver reliability of substantia nigra echogenicity grading (weighted values)

Diagnostic accuracy of SN findings

When comparing the diagnostic accuracy of SN TCS analysis methods, the highest ROC-AUC values were found in the contralateral SN (planimetry 0.821, echogenicity grading 0.792) with the most favourable sensitivity to specificity relation for the categories ‘hyperechogenic planimetry (>0.25 cm2) (J=0.52) and echogenicity grade III (J=0.54) (table 3). The subanalysis of good bone window measurements showed slightly higher J values and ROC-AUC in the ipsilateral SN compared with the analysis of the entire dataset but no remarkable changes contralaterally. The ROC analysis of planimetry showed a hyperechogenic contralateral SN area of 0.24 cm2 as the optimum cut-off value for the differentiation between PD and controls (sensitivity 79%, specificity 81%, J=0.60; figure 2).

Table 3

Diagnostic accuracy of substantia nigra planimetry and echogenicity grading for prediction of the disease status (Parkinson's disease or control).

Figure 2

Receiver-operating characteristic (ROC) plot of contralateral substantia nigra (SN) planimetry for the prediction of Parkinson's disease. The area under the curve (ROC-AUC) is 0.821. Dotted lines show the tangent of the diagonal with the ROC curve and the perpendiculars from this point (black circle) to both axes indicating 81% specificity (1–specificity=19% on the x-axis) and 79% sensitivity (on the y-axis). The black circle represents the optimum cut-off value of contralateral SN planimetry (0.24 cm2) for the distinction between Parkinson's disease and controls (Youden Index J=0.60).


This study reports data on the inter- and intrarater reproducibility and diagnostic accuracy of different SN TCS analysis approaches applied by four blinded examiners in PD patients and controls. Other than previous studies which reported simple correlation coefficients as a measure of interobserver variability in SN TCS,8 13–16 we calculated established reliability indices (weighted κ, ICC), since they do account for the random-chance of rater agreement.17

As expected, these data proved a significant bilateral enlargement of the hyperechogenic SN in PD patients versus controls, which is in line with several previous studies.1 The substantial intra- and interobserver reliability of quantitative SN planimetry validates the reproducibility of this quantitative method in the diagnostic work-up of PD patients. Two previous papers reported comparably high intra- and interobserver reliability for SN planimetry in healthy controls with ICC 0.9818 and Cohen kappa 0.83.19 In contrast, we found a remarkably lower rater congruency for the semiquantitative SN echogenicity grading which achieved fair (intraobserver) to moderate (interobserver) weighted κ values at best. This is in contrast to a recent study reporting Spearman rank correlation coefficients of r=0.82 (interobserver) and of 0.96 (intraobserver) for a five-grade SN echogenicity rating scale.8 The apparent discrepancy between these and our own results might be the use of different statistical methods and SN grading scales with three versus five steps. These data indicate that the enlargement of hyperechogenic signals in the SN compared with the echogenicity of the surrounding brain tissue is hard to measure reliably in a simple three-step semiquantitative approach. Rather, they suggest that the visual echogenicity grading considering the contrast of the SN to the surrounding midbrain and the size of the hyperechogenic area is much more influenced by the subjective impression of the investigator than the encircling of the SN area yielding interval-scaled quantitative data. Therefore, we suggest preferring SN planimetry with a carefully defined cut-off value for the diagnosis of PD.

A crucial point of a diagnostic test is its accuracy in the prediction of a disease. Since the investigator may be strongly biased by the personal knowledge or the presumption of the diagnosis due to apparent clinical symptoms of the examined person, we have chosen a sonographer-blinded study design. We were able to show that a marked enlargement of the SN area in planimetry as well as the semiquantitative grading III revealed ROC-AUC values which fall in the category of a moderately accurate diagnostic tool.20 The latter serve as a global statistic measure of the overall test performance.21 Moreover, the ROC analysis allowed us to calculate the optimal SN planimetry cut-off value for the distinction between PD and controls in our study cohort. We found that 0.24 cm2 for the SN contralateral to the most affected body side meets these criteria with nearly 80% sensitivity and specificity likewise (J=0.60), which is in line with recent findings on the sensitivity and specificity of SN TCS for the diagnosis of PD.22 23 However, it is important to note that the 0.24 cm2 cut-off was calculated in a relatively small study cohort and that previous studies reported slightly different values for various ultrasound systems. Most studies published so far were performed on the Sonoline Elegra system (Siemens, Erlangen, Germany) with three classification categories of the SN echogenic area, namely <0.20 cm2 normal, 0.20–0.25 cm2 moderately and >0.25 cm2 markedly hyperechogenic.1 19 24 25 Another study using this system defined the 80th percentile of control values (0.20 cm2) as cut-off and found 0.83 as positive and 0.78 as negative predictive value for the diagnosis of PD.26 Glaser et al showed an almost 1.2-fold larger echogenic SN area measured with the Sonoline Antares compared with the Elegra system in the same individuals.27 A slightly higher optimum cut-off value of 0.27 cm2 was published by Hagenah and colleagues using the SONOS 5500 machine (Philips Medical Systems, Best, The Netherlands).28 These to a certain degree divergent data emphasise that each laboratory has to establish its own standard values for SN echogenicity, depending among other factors on the ultrasound system used.7

Hyperechogenicity of the SN can be found in about 10% of asymptomatic individuals up to the age of 79 years,3 19 which has a possible impact on the diagnostic accuracy and the specificity of the method in particular. The frequency of this SN feature fits well with the known 10% prevalence of incidental Lewy Bodies in the SN of asymptomatic persons aged over 50 years.29 Several data demonstrated that a hyperechogenic SN in individuals without PD is related to lower presynaptic dopaminergic tracer uptake in PET and SPECT,3 19 30 to motor slowness and extrapryamidal symptoms in older age13 and to an increased occurrence of non-motor PD symptoms, such as olfactory dysfunction or depression.16 30 Therefore, SN hyperechogenicity has been suggested to represent a vulnerability factor of the nigrostriatal system for neuronal degeneration and a marker of individual predisposition for the development of PD,19 which is currently investigated in longitudinal studies.

We did not calculate the predictive value of SN TCS for PD, since this parameter is highly dependent on the estimated total prevalence of the disease in the study population. Though PD is a common neurological disorder, it has a relatively low and age-dependent prevalence in the total population of about 1–2% in elderly people over the age of 60 years.31 In this case, the predictive value of a given diagnostic test is remarkably lower due to the high a priori probability of false positive test results.32 To mitigate this general problem for screening tests, it is helpful to restrict the target population to a smaller group of at-risk individuals for PD, for example to those with a positive family history and with premotor PD symptoms, such as olfactory dysfunction, dysautonomia or mood and sleep disorders.33

It is important to note that our study design included some prerequisites which constitute ideal but in real practice not always given examination conditions and, therefore, limit the generalisation of our results. At first, only well-experienced sonographers took part in the study in order to avoid a loss of reliability due to the disparate training level of the investigators. A recent study by Skoloudik and colleagues clearly showed a much poorer correlation of SN TCS results in untrained sonographers.8 Thus, the reliability rates found by us can only be expected in trained investigators. However, the fact that the person who performs or interprets an imaging investigation needs to be trained is self-evident and holds true for any other imaging technique. Although explicit recommendations have not been formally defined as yet, we would like to suggest that the education level for SN ultrasound investigators should be geared to the training standards for vascular ultrasound in stroke neurologists and neuroradiologists. It has also to be kept in mind that we exclusively used a well-established ultrasound system for depiction of the SN (Sonoline Antares, Siemens AG, Germany).34 However, the technical development of various ultrasound systems is different and not necessarily adapted to the requirements of optimum brain parenchyma sonography. Also, these data clearly show that the observer reliability more than the diagnostic accuracy of SN TCS is dependent on the quality of the preauricular acoustic bone window yielding more favourable data in measurements through very permeable windows allowing adequate insonation of the midbrain. Finally, the number of patients and especially controls is small. This should especially be considered when interpreting the results concerning prediction of the disease. All these considerations underline that each examiner has to interpret the results of brain parenchyma sonography carefully in the context of the individual scanning conditions.

In conclusion, these data show that the observer variability of SN TCS is low in the hands of experienced investigators. The method offers sufficient observer reliability when the interval-scaled planimetry of the echogenic SN area is used. Adequate diagnostic accuracy of SN TCS with almost 80% sensitivity and specificity for PD diagnosis is ensured by SN planimetry with an adequate cut-off value. We conclude that reliable SN TCS data on PD can also be achieved in multicentre trials with a larger number of investigators when standardised analysis protocols and a strict quality control are implemented.


We thank the Siemens AG (Erlangen, Germany), for providing four ultrasound devices of the Sonoline Antares system.


View Abstract


  • Competing interests SB was a grantee of the Michael J Fox Foundation. JH is funded by the Bachmann-Strauss Dystonia Parkinson Foundation and received honoraria as an invited speaker from GlaxoSmithKline. RH has received speaker honoraria from Medtronic, Orion, GlaxoSmithKline, TEVA, Cephalon, Solvay, Desitin and Boehringer Ingelheim as well as travel funding from Medtronic and Cephalon. He serves or has served on a scientific advisory board for Cephalon. He has received research funding from the Deutsche Parkinson Vereinigung (dPV) and the University of Frankfurt/Main. DB is principal investigator of multicentre trials funded by Novartis, Johnson and Johnson, Boehringer, Teva and Eisai. She is a member of advisory boards for Novartis, GlaxoSmithKline, Teva, Lundbeck and UCB, and has received speaker honoraria from UCB, GlaxoSmithKline, Teva, Lundbeck, Novartis and Merck Serono. Her work was funded by research grants from the Michael J Fox Foundation, BmBF, German Parkinson Asscociation and Tübingen University.

  • Ethics approval Ethics approval was provided by the Ethical committee of the medical faculty of Goethe University, Frankfurt am Main, Germany.

  • Provenance and peer review Not commissioned; externally peer reviewed

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.