OBJECTIVES Head injury is a common event. Most patients sustain a mild head injury (MHI), and management depends on the risk of an intracranial haemorrhage (ICH). The value of a plain skull radiograph as a screening tool for ICH is controversial. The aim of this meta-analysis was to estimate and explain differences in reported sensitivity and specificity of the finding of a skull fracture for the diagnosis of ICH, in order to assess the value of the plain skull radiograph in the investigation of patients with MHI, and to estimate the prevalence of ICH in these patients.
METHOD After a systematic literature search 20 studies were selected that reported data on the prevalence of ICH after MHI and/ or data on the diagnostic value of skull fracture for the diagnosis of ICH. The mean prevalence of ICH weighted for the sample size was determined. The sensitivity and specificity of different studies were combined using a summary receiver operator characteristic curve. Correlation analysis was used to determine factors that could explain the reported differences between studies.
RESULTS The weighted mean prevalence of ICH after MHI is 0.083. The potential for verification bias and the percentage of patients who had suffered loss of consciousness or post-traumatic amnesia were the most significant factors explaining interstudy differences in sensitivity and specificity. Based on studies wherein at least 50% of patients had a CT study of the brain, the estimated sensitivity of a radiographic finding of skull fracture for the diagnosis of ICH is 0.38 with a corresponding specificity of 0.95.
CONCLUSION The plain skull radiograph is of little value in the initial assessment of MHI patients.
- head injury
- skull fracture
- radiological diognosis
Statistics from Altmetric.com
Head injury is one of the most common injuries and can be considered a silent epidemic. In the Western world it is one of the leading causes of disability, especially in the young population. The Head Injury Task Force of the National Institute of Neurologic Disorders has estimated that there are 2 000 000 cases of head injury in the United States annually. In The Netherlands the estimated incidence of head injury is 0.14- 0.64%(Twijnstra 1998, personal communication), slightly less than the reported incidence in the United States. Most patients (80% to 90%) sustain a mild head injury (MHI) and do not need admission to hospital or complex health care. If these patients attend the emergency department of a hospital, almost all are sent home. This, however, does not mean that MHI is a totally benign condition. An outcome study of patients who had a head injury suggested that patients with a low risk of dying—that is, patients with MHI—are at the greatest risk of inadequate diagnosis and treatment.1 Considering the many people affected, little research has been done on the assessment and treatment of this category of patients. This is also reflected by the fact that management protocols for MHI are still under debate, which has led to considerable differences in strategies. In the past few years protocols have been published,2-4 which might be seen as belonging to two different schools: the North American and the European. In North America, the routine use of CT for the radiological assessment of patients with MHI is currently under debate, whereas in Europe the use of a plain skull radiograph is disputed. The primary management goal in MHI is to identify those patients who are at risk of developing complications, specifically an intracranial haemorrhage (ICH) requiring surgery. Clinical assessment alone is inadequate for the detection of ICH,5 and radiological procedures are therefore used as additional screening tools. That patients with a skull fracture have an increased risk of intracranial haematoma is well known,6-8 but does this have practical significance? A skull fracture by itself has few clinical consequences, except in cases of a depressed skull fracture. The potential clinical usefulness of radiological assessment for skull fracture depends on the ability to distinguish between patients with MHI with and without ICH.
To judge the usefulness of the diagnosis skull fracture, it is important to evaluate the sensitivity and specificity of this finding as a test for the presence or absence of ICH, and to determine the prevalence of ICH in patients with MHI. Unfortunately, sensitivity and specificity estimates reported in the literature show large variation. This may be because published studies differ in design (prospective and retrospective approaches), patient selection (admitted patients or patients seen at the emergency department), and inclusion criteria (based on Glasgow coma scale (GCS), loss of consciousness (LOC) or post-traumatic amnesia (PTA)). Although ICH is mostly diagnosed by CT, in older studies it was diagnosed on the basis of clinical, operative, or postmortem findings. Comparison of the data is further complicated by possible differences in threshold for a positive test result (fracture or ICH) or by differences in technical instrumentation. In some studies considerable abnormalities may be required to be present before the test is declared positive, whereas others may require only a hint of abnormality. In the first case, sensitivity will be low and specificity high; in the second case sensitivity will be high and specificity low. The implication is that there is a trade off of sensitivity against specificity between the studies, which needs to be taken into account in any method for combining results.
Given this diversity, it is presently not possible to draw a conclusion about the value of radiography in detecting skull fracture in the management of patients with MHI, and for this reason we carried out a meta-analysis of published data, using correlation analysis to identify the most important sources of variation in prevalence and diagnostic accuracy between studies, followed by use of the summary receiver operator characteristic curve (ROC) technique described by Moseset al,9 to assess the effect of these potential sources of variation, and to summarise reported sensitivity and specificity estimates from the reviewed studies. Our aim was to assess the value of the diagnosis skull fracture for the diagnosis of ICH, and to summarise reported sensitivity and specificity estimates from reviewed studies. We therefore tried to account for (part of) the differences in reported sensitivity and specificity of skull fracture for the diagnosis of ICH between studies. To be able to estimate the predictive value of the diagnosis skull fracture, the prevalence of ICH in patients with MHI was also estimated.
Material and methods
LITERATURE SEARCH STRATEGY AND DATA COLLECTION
A systematic search for relevant original publications was conducted in Medline, Embase, and Current Contents from 1966 to 1998, using the following search keys: skull fracture, skull injury, skull radiography, skull trauma, skull films, (brain or head), (trauma or injury or injuries) and computed tomography (all subheadings). The articles were primarily selected on the basis of the title and the abstract. Additional references were obtained from the bibliographies of the original articles. The full text of about 200 relevant articles was retrieved. Two sets of articles were selected, one set to estimate the diagnostic value of a finding of a skull fracture, and a second set to assess the prevalence of ICH in patients with MHI.
For the first set of papers the test under study is the plain skull radiograph for the determination of the presence of an ICH. In those studies where no plain skull radiography was performed, CT data were used. Papers were included if they contained data on the diagnostic value of a finding of a skull fracture by plain radiograph or CT in patients who had MHI.
The second set of papers was selected for the assessment of the prevalence of ICH in MHI. For these studies the standard of reference for diagnosis was the existence of ICH on CT. Only a few studies fulfilled this strict criterion; therefore we lowered the norm, and at least 50% of the patients needed to have undergone CT.
For the purpose of this study, MHI was defined as trauma to the head, with the patient having a Glasgow coma scale (GCS)10 score of 13 to 15 on initial presentation. In the selected studies the diagnosis of ICH was made by CT. If no CT was performed an uneventful recovery was considered a sign for the absence of an ICH. In some older studies angiography was used to diagnose ICH, and neurosurgical findings were used by some as well. An arbitrary minimum of 50 patients was required. Studies with less patients will have a statistically unreliable estimate of sensitivity, specificity, and prevalence. Studies with only paediatric or geriatric patients were not included. If the data permitted, multitrauma patients and referrals were excluded. A standard form was used to extract relevant data from the original articles on study and patient characteristics, and various test results (table 1).
Prevalence of ICH
Prevalence was defined as the percentage of patients in the study with a diagnosis of ICH. Both mean prevalence weighted for sample size and unweighted mean prevalence were calculated. The weighted mean was defined as11:
mean prev=∑(ωiprevi)_/∑ωi (1)
with ωi = 1/(previ(1-previ)/N i) (2)
Calculation of the true and false positive rate
For evaluation of the diagnostic value of a skull fracture only the diagnosis of ICH was used, and not the report of a surgical intervention. This choice was made firstly because the indication for intervention differed between institutions and clear criteria were rarely given, and secondly because some investigators considered the placement of intracranial pressure monitor devices as an intervention, whereas others excluded this procedure. In those studies where no plain skull radiography was performed, CT data on skull fractures were used.
From the collected data the number of true and false positive observations and the number of true and false negative observations were derived. True positive (TP) is defined as the finding of both a skull fracture and an ICH, false positive (FP) as a skull fracture without an ICH, true negative (TN) as the absence of both a skull fracture and an ICH, and false negative (FN) as the absence of a skull fracture in the presence of an ICH. Using these data the true positive rate (TPR=TP/(TP+FN)) and the false positive rate (FPR=FP/(FP+TN)) were calculated. The TPR equals the sensitivity and the FPR equals (1-specificity).
Correlation study to identify confounding differences between studies
The TPR and FPR are not independent. Rather, there is a trade off between the two, as is reflected in the ROC curve. Without an exact match of the study population and analysis characteristics, simply averaging these rates can be very misleading and does not provide a representative summary.12 To determine the effect of interstudy differences as mentioned in table 1, a correlation analysis was performed with parameter D, which is defined in the next subsection, and which is a measure for how well the test discriminates between the population with and without ICH. The Spearman correlation test was used for this analysis.
Summary operator characteristic curve
For the analysis of TPR and FPR data, as found in the different studies, we used a summary ROC (sROC) curve as described by Moseset al.9 The analytical method is based on the principle that the sROC curve is conveniently represented as a roughly straight line when logit TPR is plotted against logit FPR. For statistical reasons, logit TPR-logit FPR (D) is modelled as a linear function of logit TPR+logit FPR (S).
S=logit (TPR)+logit (FRP) (3)
D=logit (TPR)-logit (FRP) (4)
with the logit defined as:
S is related to how often the test is positive and D is a direct measure of how well a test discriminates between the population with an ICH and without an ICH, since:
D=1n(odds ratio) (6)
The odds ratio is a measure of association used in epidemiological studies. In diagnostic studies, the odds ratio is the odds of a positive test result in diseased patients divided by the odds of a positive test result in non-diseased patients. The higher the odds ratio, the better the test discriminates between patients with and without the disease of interest.13
To estimate the relation between S and D a linear model is fitted to the data:
C is a measure of the ability of the test to discriminate between diseased and non-diseased persons, and α is a measure of the extent to which D depends on the threshold for a positive test result. The higher the constant C, the better the discriminatory ability of the test. Using the fitted α and C, the relation between TPR and FPR can be transformed back into an sROC curve.
Equation 7 can be extended with further factors (F) in order to evaluate the influence of study and population characteristics on D:
D=αS+C+β F (8)
The goodness of fit was expressed by the square of the correlation coefficient (R 2) between the observed value of D and the predicted value of D. IfR 2 is 1, there is a perfect fit; if R 2 is 0 there is no linear relation between the observed and the predicted value of D. The data analysis was performed using commercially available software (Microsoft Excel 5.0a and SPSS 6.1.1 for the PowerPC Macintosh).
DESCRIPTION OF STUDIES
Twenty studies were identified that could be used to study the prevalence of ICH or the diagnostic value of the radiological detection of skull fracture for the diagnosis of ICH in adult patients with MHI. Thirteen of these studies contained data on the prevalence of ICH based on CT examinations.5 14-25 Table 2summarises the data from this group of studies. In 13 of the 20 studies, TPR and FPR of the finding of skull fracture in predicting ICH could in principle be calculated.17 19-22 25-32 Although two studies included patients with moderate and severe head injury, these studies were nevertheless included because most patients had MHI (over 90%).29 30 It seems unlikely that the small proportion of patients with moderate and severe head injury in these studies could have a major impact on the conclusions of the meta-analysis. In five studies no plain skull radiography was done and in these studies CT data on skull fractures were used to assess the relation with ICH. Nine studies were retrospective; the others were prospective. Table 3 summarises the characteristics of these studies. Note that there is overlap between the two groups of tables 2 and 3: six studies were used for both analyses.
PREVALENCE OF ICH AND CORRELATIONS
The mean prevalence of ICH after MHI was 0.1 (95% confidence interval (95% CI) 0.02–0.18, range 0.03–0.18) and the weighted mean prevalence was 0.083 (95% CI 0.03–0.13, table 2).
The sensitivity (TPR) of the finding of skull fracture in predicting ICH ranged from 0.13 to 0.75 and the specificity (1-FPR) from 0.91 to 0.995. The mean D of all studies was 3.35, and the mean sensitivity was 0.50, corresponding to a specificity of 0.97 on the sROC (figure). Studies with a high TPR tended to have a higher FPR, but the fit of the sROC curve to the observed pairs of sensitivity and specificity values was poor (R 2=0.08). Therefore, the differences in discriminatory ability between studies cannot be explained by differences in diagnostic thresholds for positive test results. Consequently, an alternative explanation was needed for the variation in sensitivity and specificity. Spearman rank correlation analysis showed that the percentage of patients with LOC/PTA and the percentage of patients who had undergone CT was significantly correlated with D (table 4). A model based on equation 8, which included (besides C and S) a factor representing the percentage of patients with LOC/PTA fitted the data better (R 2=0.73). Addition of a factor representing the percentage of patients who underwent CT resulted in an even better fit (R2 =0.81). This confirms that differences in patient selection and the percentage of patients in whom the diagnosis was verified by CT were important sources of variation between studies.
Sensitivity and specificity are not invariant to the population under study, and often they will depend on patient characteristics—for example, patient selection. In clinical studies this is often a reflection of clinical practice. For example, a study with patients admitted for MHI is likely to have a more severely injured population than a study with only emergency department patients. We considered patient selection as an important source of variation between the estimates of sensitivity and specificity, and the percentage of patients with LOC/PTA was the most significant selection criterion.
Selection of patients who underwent CT depending on the plain skull radiography results, or on patient characteristics, will result in verification bias, also called work up bias. We used the percentage of patients who underwent CT as a measure of verification bias. The lower the percentage of patients in whom the diagnosis of ICH was verified by CT, the higher the potential for verification bias. Although not explicitly mentioned in the studies, it is very likely that the decision to perform a CT was based on the patient assessment and/or the skull radiography findings.
The percentage of patients with LOC/PTA was strongly correlated with the potential for verification bias. Two groups of studies were formed. The first group contained the studies in which fewer than 50% of patients had LOC/PTA and fewer than 50% of patients had CT (group 1). This group had thus a high potential for verification bias. The second group contained studies for which both percentages were higher than 50% (group 2). There is only one study that did not fit in either of the two groups.27 The sROC curve fitted to the data for group 1 using equation 7, showed that in this population a relatively high TPR was reached at a low FPR (figure). The sROC curve of data for group 2 was lower than the sROC curve of group 1. The mean D in group 1 and group 2 was 4.3 and 2.4 (p=0.016), respectively. Summary values for sensitivity and specificity are not directly available from the analysis, but estimates can be read off the sROC curve. The mean sensitivity was 0.59 and 0.38, with corresponding specificities of 0.98 and 0.95, for groups 1 and 2, respectively.
In this meta-analysis we investigated the value of radiological assessment for skull fracture in the diagnosis of ICH in patients with MHI and analysed the prevalence of ICH in this category of patients.
Despite the high incidence of MHI, relatively few well designed prospective studies on the management of MHI have been published. All studies found by our literature search were biased to a lesser or greater extent. Firstly, the percentage of patients with a history of LOC/PTA varied considerably, resulting in patient selection bias, and secondly, the percentage of patients in whom the diagnosis of ICH was verified by CT was highly variable, resulting in a potential for verification bias in most studies. In earlier studies only a small percentage of patients underwent cranial CT, and even nowadays patients with a GCS score of 15 but no a history of LOC/PTA seldom undergo CT. In older studies cerebral angiography, and operative and postmortem findings were used to establish the diagnosis of ICH.
The mean prevalence of ICH in patients with MHI was 0.10, the range 0.03–0.18, with a weighted mean of 0.083. The percentage of patients with LOC/PTA was relatively high in the studies that were used to derive this prevalence. In the studies with a low percentage of patients with LOC/PTA, fewer patients underwent CT (higher potential for verification bias). A high prevalence of ICH has also been found in studies including only patients with a GCS score of 15 and LOC/PTA.18 24
A strong potential for verification bias—that is, few CT scans—leads to an overestimation of the sensitivity. Patients with a negative skull radiograph will not undergo CT, so patients with false negative results will have a higher chance of remaining undetected.33 This bias could offer an explanation for a mean sensitivity of 0.59 for group 1 (less than 50% CT), compared with a sensitivity of 0.38 for group 2 (over 50% CT). The unverified negative test results (no skull fracture) were assumed to have no ICH, and this will result in an overestimation of the specificity.33 The data corroborate this: the specificity of 0.98 for group 1 (higher potential for verification bias) is higher than the specificity of 0.95 for group 2 (lower potential for verification bias). Patient selection bias and verification bias were strongly associated in the studies investigated. This made it possible to distinguish one group of studies with both a low percentage of patients with LOC/PTA and few undergoing CT, and a second group with a high percentage of patients with LOC/PTA and relatively many undergoing CT. Because verification bias affects the sensitivity,33 the sensitivity of the radiological finding “skull fracture” for the diagnosis ICH is most reliably obtained from studies with a low verification bias (group 2). In that group the mean sensitivity was 0.38 with a corresponding specificity of 0.95.
It should be kept in mind that an sROC curve differs from a traditional ROC curve. The ROC curve describes the relation between sensitivity and specificity in a single population, with a changing threshold. The sROC curve results from fitting a smooth line to data points representing pairs of sensitivity and specificity values from different studies and thus different populations. Therefore, the area under the curve, as a measure of overall diagnostic accuracy, cannot be determined for the sROC curve, whereas it can for the traditional ROC curve.9
By combining the results for sensitivity, specificity, and prevalence, it is possible to calculate the positive predictive (PPV=TP/(TP+FP)) and negative predictive value (NPV=TN/(TN+FN)) of the radiological detection of skull fracture for the diagnosisof ICH. With a sensitivity of 0.38 and a specificity of 0.95, as found for group 2, and a prevalence of 0.083, the PPV is 0.41 and the NPV is 0.94. This means that if there is a skull fracture, the probability of an ICH is about 4.9 times higher than before testing. A plain skull radiograph increased the probability of no ICH from 92% to 94%. What these figures mean in clinical practice is illustrated in table 5 for a fictitious group of 1000 patients. The most important conclusion of this review is that a positive skull radiograph does not predict an ICH with certainty, although the risk is definitely increased. More importantly, at a sensitivity of 0.38, a plain skull radiograph does not provide much extra information and cannot be used for ruling out the diagnosis of ICH.
The findings of this review contradicts data from the literature. Of the 735 patients who had an ICH in the 13 studies, only 322 (44%) had a skull fracture. Therefore, the claim that 80% of patients with ICH have a skull fracture8 is not valid. Moreover, at a prevalence of 0.083, the probability of ICH in patients with MHI and a skull fracture is about five times higher than in patients without a skull fracture. This is in contradiction to the 41-fold increased risk mentioned by Mendelow et al.34There was a strong potential for verification bias in that study, because ICH was verified in only a few patients. This may explain the high sensitivity of 0.75 and the high relative risk of ICH in patients with a skull fracture in that study. Furthermore, in the studies included in this meta-analysis the prevalence of ICH in patients with MHI presenting at an emergency department was in the order of 0.03 to 0.10, rather than the reported value of 0.003.34
A few points need to be discussed. The first point concerns the use of both plain skull radiography and CT (in five studies) to detect skull fracture. The plain skull radiograph is considered to be more sensitive for the detection of calvarial skull fracture than CT, whereas CT is more sensitive for the detection of skull base fractures. In the light of other differences between the studies, we considered that this possibly not fully equivalent sensitivity was acceptable. A second point concerns the use of the diagnosis of ICH as the gold standard, instead of intervention or clinical course. The existence of ICH is of clinical importance as an indicator of the severity of the trauma and as a guideline for rehabilitation.35 36 It may very well be that the many ICHs that went undetected until recently is in part responsible for the high incidence of post-concussional syndrome in patients with MHI.37
In all studies, the radiologist's report of the skull radiograph was used, whereas in daily practice the emergency physician or resident assesses the radiographs, and management is based on these initial findings. Thillainayagam et al showed that up to 10% of skull fractures are missed by less experienced physicians,38 who usually see most patients with MHI in many institutions. This will decrease the sensitivity of the skull radiograph even further.
The estimated mean prevalence of ICH after MHI was 0.083. The two most significant factors explaining the interstudy difference in reported sensitivity and specificity of the existence of a skull fracture for the diagnosis ICH are the percentage of patients with LOC/PTA and the potential for verification bias. We conclude that the plain skull radiograph has no place in the assessment of MHI in adult patients. The question is not whether the detection of a skull fracture ever assists in the detection of ICH, but whether this is effective. Our analysis shows that the plain skull radiograph was ineffective as a screening tool for patients with MHI: only slightly more than one third of ICH were detected in this way. The low sensitivity implies that if a skull fracture is not seen on plain skull radiography, the diagnosis of ICH still cannot be ruled out. If patient selection increases the likelihood of ICH, CT becomes the modality of first choice.
Data from the literature also suggest that some patients with MHI and a GCS score of 15 do not require any imaging. Two studies described a subpopulation of MHI patients with a GCS score of 15 and no LOC/PTA or any other neurological symptoms.30 32 None of these patients had an ICH, and no interventions were needed. Patients with a GCS score of 15 and LOC/PTA, and patients with a GCS score of 13 and 14, require either observation, CT, or both.