Clinically important change in quality of life in epilepsy
- 1Departments of Clinical Neurological Sciences and Epidemiology and Biostatistics, University of Western Ontario, London, Ontario, Canada
- 2Department of Psychology, University of Western Ontario
- 3London Health Sciences Centre, London, Ontario
- Correspondence to: Dr S Wiebe, London Health Sciences Centre, University Campus, 339 Windermere Rd, London, Ontario, Canada N6A 5A5;
- Received 26 November 2001
- Accepted 3 April 2002
- Revised 8 March 2002
Background: Health related quality of life (HRQOL) is increasingly recognised as an important outcome in epilepsy. However, interpretation of HRQOL data is difficult because there is no agreement on what constitutes a clinically important change in the scores of the various instruments.
Objectives: To determine the minimum clinically important change, and small, medium, and large changes, in broadly used epilepsy specific and generic HRQOL instruments.
Methods: Patients with difficult to control focal epilepsy (n = 136) completed the QOLIE-89, QOLIE-31, SF-36, and HUI-III questionnaires twice, six months apart. Patient centred estimates of minimum important change, and of small, medium, and large change, were assessed on self administered 15 point global rating scales. Using regression analysis, the change in each HRQOL instrument that corresponded to the various categories of change determined by patients was obtained. The results were validated in a subgroup of patients tested at baseline and at nine months.
Results: The minimum important change was 10.1 for QOLIE-89, 11.8 for QOLIE-31, 4.6 for SF-36 MCS, 3.0 for SF-36 physical composite score, and 0.15 for HUI-III. All instruments differentiated between no change and minimum important change with precision, and QOLIE-89 and QOLIE-31 also distinguished accurately between minimum important change and medium or large change. Baseline HRQOL scores and the type of treatment (surgical or medical) had no impact on any of the estimates, and the results were replicated in the validation sample.
Conclusions: These estimates of minimum important change, and small, medium, and large changes, in four HRQOL instruments in patients with epilepsy are robust and can distinguish accurately among different levels of change. The estimates allow for categorisation of patients into various levels of change in HRQOL, and will be of use in assessing the effect of interventions in individual patients.
- HRQOL, health related quality of life
- HUI, health utilities index
- MCS, mental composite score of SF-36
- MIC, minimum clinically important change
- PCS, physical composite score of SF-36
- QOLIE, quality of life in epilepsy inventory
- SF-36, medical outcomes study short form questionniare
Health related quality of life (HRQOL) is recognised as an important outcome in epilepsy treatment, and various instruments have been developed to assess it in epilepsy. Typically, studies exploring the impact of epilepsy treatments on HRQOL compare the mean score of instruments among various treatment groups and assess whether the differences are statistically significant. However, it is difficult to interpret the importance of mean changes in HRQOL, regardless of their statistical significance. This is because aggregate data (group means) convey no information about the number of individuals in a group who experience clinically important change. For example, when the mean change for the group is not statistically significant or when it is lower than a prespecified minimum threshold, clinicians may erroneously conclude that the treatment has no important effects. As shown by Guyatt et al, small mean changes can conceal clinically important treatment effects in a substantial number of patients.1 Conversely, large mean changes can be accounted for by a small number of individuals experiencing large changes, while the majority of the group remains unchanged.1 Clinical interpretation of HRQOL requires a notion of what constitute clinically important, small, medium, and large changes in instrument scores in individual patients.
We quantified the amount of change in commonly used epilepsy specific and generic HRQOL instruments that patients consider as important—that is, the minimum clinically important change (MIC)—and we also obtained estimates of small, medium, and large changes in these instruments.
We prospectively assessed 136 consecutive adults with medically refractory focal epilepsy with or without secondary generalisation who were investigated for epilepsy surgery. Patients aged 16 years or older were eligible if they could complete self administered HRQOL questionnaires. They were excluded if they had non-epileptic seizures, learning disability, progressive central nervous system disorders, or medical conditions precluding epilepsy surgery. We aimed to enrol a broad clinical spectrum of patients representative of adults with medically refractory focal epilepsy. Patients gave informed consent and the institutional ethics review board approved the study.
Health related quality of life instruments
We conceptualised HRQOL as the patients’ own experience of health, and assessed by the patients’ perception of change in their own health status.2 We quantified change in four HRQOL instruments. Two are epilepsy specific—the quality of life in epilepsy inventory-89 (QOLIE-89)3 and the QOLIE-31,4 a shorter instrument derived from QOLIE-89; and two are generic—the health utilities index mark III (HUI-III),5 and the medical outcomes study short form (SF-36).6 The latter is contained entirely within the QOLIE-89 as its generic core. The HUI-III generates utility scores that can be used to obtain quality adjusted life years, the common metric for cost-effectiveness analysis. All instruments have satisfactory internal consistency and content, construct and convergent validity, and responsiveness in epilepsy.7–10
Instruments were self administered twice, six months apart, following developers’ guidelines. All participants received the QOLIE-89, from which QOLIE-31 and SF-36 are derived, while a subgroup of 80 patients received the HUI-III. Patients answered all questionnaires at the same time and in the same order. All questionnaires were immediately reviewed for completeness and patients were contacted for missing or ambiguous responses.
Patient centred assessment of change in HRQOL
To ascertain meaningful change in HRQOL11 we used patient centred global ratings of change, as described by Jaeschke et al.12 Because HRQOL in epilepsy is multidimensional, to obtain an overall impression of change we asked patients to specify the direction and amount of change in five areas (questions): overall HRQOL, general health, social activities and work, seizures, and drug side effects. Patients first rated the five areas as worse, about the same, or better compared with six months earlier, and then scored the amount of change using a 15 point scale ranging from −7 (a very great deal worse) through 0 (no change) to +7 (a very great deal better) (see appendix). Participants completed all five global rating questions at the same time and in the same order, concurrently with the six month HRQOL questionnaires. All responses were reviewed for completeness and validity, and patients were contacted if clarification was necessary.
The mean score of the five questions served as the summary global rating of change for each patient. In general, global rating scores between 0 and ±1 are considered as benchmarks for no change, 2 to 3 or −2 to −3 as small change, 4 to 5 or −4 to −5 as medium change, and 6 to 7 or −6 to −7 as large change. These categories, initially specified by Jaeschke et al on the basis of experience and clinical intuition,12 have subsequently been validated in studies of diverse populations13–15 and using different techniques.16
Investigators have equated a small change (a global rating score of 2 to 3) with the MIC. For patients with medically refractory epilepsy, the MIC in global ratings is 3. This value arises from a previous study in which patients with medically refractory epilepsy ascertained the minimum amount of worthwhile change in HRQOL, using a seven point scale ranging from 1 = no change at all to 7 = a very great deal of change. On average, patients considered a change of 3 as the MIC.17 Other investigators have arrived at similar values for MIC in other conditions.12,13,16,18
Linear regression analyses were used to assess the relation between the patients’ summary global rating of change and change (six month minus baseline total score) in QOLIE-89, QOLIE-31, HUI-III, and the physical (PCS) and mental (MCS) composite scores of SF-36. The fitted regression line was anchored at zero (that is, no intercept). We examined the assumption that zero change in summary global ratings corresponded to no change in HRQOL instruments by assessing the magnitude of the intercepts. The values of the intercepts ranged from 0.05 to 1.0 and therefore justified our assumption. R2 was used to assess the strength of the relation between the global rating and change in HRQOL. Estimates of change in HRQOL, and the corresponding 95% confidence intervals (CI), were calculated from the fitted regression line using the midpoint of the four benchmarks of change for the summary global rating: 0.5, no change; 2.5, small change; 4.5, medium change; and 6.5, large change. An estimated change in HRQOL for a MIC of 3 was also calculated from the fitted regression line.
Several investigators have noted that baseline HRQOL scores can substantially influence the magnitude of change in HRQOL.18–20 We assessed the extent of this influence, as recommended by Bland and Altman,21 by calculating the Pearson product–moment correlation coefficients between HRQOL change scores and the mean of baseline and six month scores. In addition, we used Pearson correlation analyses to determine whether the level of baseline HRQOL influenced global ratings—that is, whether patients with poorer baseline HRQOL systematically appraised change differently from those with better HRQOL. Finally, we validated our results by performing the same analyses in a subgroup of 80 patients who answered the same set of HRQOL questionnaires and global ratings at nine months.
Of 136 patients, 70 underwent surgery and 66 continued to receive medical treatment for their epilepsy. The patients’ clinical characteristics were representative of patients suffering from difficult to control focal epilepsy (table 1). The response rate for all instruments and global ratings ranged from 94% to 100%. There were no ceiling or floor effects in any of the HRQOL instruments. Mean scores at baseline, follow up, and change were similar to those described in previous reports (table 2). The summary global ratings showed that 85 patients (63%) rated themselves improved (> 1), 25 (18%) as unchanged (0 to 1), and 26 (19%) as worse (≤ 1); eight patients (6%) endorsed maximum change. The mean (SD) of the summary global rating score for the group was 1.9 (2.8).
The R2 values assessing the strength of the relation between summary global rating and change in HRQOL score are shown in table 3. The summary global rating was a good predictor of change for the epilepsy specific QOLIE-89 (fig 1) and QOLIE-31, and it was a modest predictor of change for the generic HUI-III and a poor predictor of change for both composite scores of the generic SF-36 (table 3). There were no significant correlations between the baseline HRQOL scores and change scores in QOLIE-89 (R2 = 0.01), PCS (R2 = 0.0001), MCS (R2 = 0.002), and HUI-III (R2 = 0.004). Although there was a statistically significant correlation between baseline and change scores in the QOLIE-31, the R2 (0.06) approached zero and is probably clinically insignificant. We found no correlations between baseline HRQOL and global ratings. Baseline HRQOL explained only 1–8% (average 2%) of the variance in global ratings.
Table 3 shows the amount of change in HRQOL instruments that corresponds to no change, small, medium, or large change, and MIC in summary global rating scores. Inspection of the 95% confidence intervals in table 3 shows that QOLIE-89 (fig 2) and QOLIE-31 can distinguish precisely between no change, small, medium, or large change, as demonstrated by 95% confidence intervals with minimum or no overlap. HUI-III distinguishes correctly between no change, small change, and medium change, but broad 95% confidence intervals around the estimate for large change limits the usefulness of the latter category. The MCS and PCS components of SF-36 discriminate accurately only between no change and small change, estimates for large change being too imprecise to be meaningful.
The MIC was 10.1 for QOLIE-89, 11.8 for QOLIE-31, 4.6 for SF-36 MCS, 3.0 for SF-36 PCS, and 0.15 for HUI-III. All instruments were able to differentiate between no change and MIC with statistical significance, and QOLIE-89 and QOLIE-31 also distinguished between MIC and medium or large change. As expected, the MIC is very close to small change, and their 95% confidence intervals overlap in all instruments.
We examined whether the type of treatment (surgical or medical) influenced the results and found no differences in any of the estimates between the two treatment groups.
A separate analysis using the same methods and instruments in a subgroup of 80 medically or surgically treated patients tested at nine months generated similar results to those of the first analysis. The MICs were 12.0 (95% CI, 8.6 to 15.5) for QOLIE-89, 11.1 (7.7 to 14.5) for QOLIE-31, 4.4 (1.5 to 7.4) for SF-36 MCS, 4.1 (2.4 to 5.9) for SF-36 PCS, and 0.13 (0.7 to 0.19) for HUI-III.
Assessing clinically important change in individual patients is increasingly recognised as a prerequisite for judging the impact of interventions on HRQOL.22 This allows clinicians to obtain clinically useful measures such as absolute differences between treatments and the numbers needed to treat1 for one additional patient to benefit. The importance of exploring change in individual patients rather than in the group mean can be illustrated by further scrutiny of our data. Although the group mean change in all instruments was relatively small (table 2), and it fell below the estimated MIC for all instruments (table 3), 53 patients (40%) experienced an improvement equivalent to the MIC or larger (global rating ≥ 3); and of equal importance, HRQOL declined (global rating ≤ 1) in 26 patients (20%).
Lydick and Epstein have reviewed several methods to ascertain the MIC in individual patients.11 We chose the global ratings method because it is patient centred, clinically based, easy to use, has been validated in several conditions, and generates similar MICs to those obtained by other approaches.16,18,23,24
Because the global ratings explained a relatively small proportion of the variance in both composite scores of the SF-36 (R2 values of 0.12 and 0.15), our estimates of clinically meaningful change for this instrument are less accurate than for the other three instruments. Research in epilepsy9 and in conditions such as carpal tunnel syndrome,25 chronic sinusitis,26 and angina,27 shows that SF-36 is less responsive to change than disease specific instruments. Some commentators have concluded that SF-36 is less apt than disease specific instruments for assessing clinically important change.25 Nevertheless, the MIC (3 to 4.6) and the effect size (0.3 to 0.4) for SF-36 in this study are similar to previous estimates in other conditions and using different methods,28,29 and the 95% confidence intervals around the MIC were narrow. Therefore, we believe that SF-36 can adequately distinguish between no change and MIC in epilepsy. The results are more accurate for HUI-III (R2 = 0.30) and the estimates can distinguish among no change, MIC, or small, medium, and large change. We know of no reports assessing clinically meaningful change for the HUI-III in other conditions. Finally, the estimates for QOLIE-89 and QOLIE-31 are quite accurate. The R2 value is well within the range reported for other instruments in this type of analysis18,30,31 and the 95% confidence intervals are sufficiently narrow to distinguish accurately between categories of change.
An important question is whether baseline HRQOL scores influence the estimates of MIC. We found no correlation between baseline scores and global ratings. Therefore, our estimates of clinically meaningful change seem applicable to a broad range of baseline HRQOL values, including negative and positive scores (fig 1). However, because only 26 patients (19%) rated themselves as worse, the results may be less accurate for judging worsening than improvement.
The MICs generated in this analysis should be viewed in the context of other measures of change in HRQOL. First, the effect sizes corresponding to the MIC were 0.58 (medium) for QOLIE-89, 0.72 (medium to large) for QOLIE-31, 0.38 (small to medium) for MCS, 0.3 (small to medium) for PCS, and 0.5 (medium) for HUI-III. Thus at least medium effect sizes are needed to detect an MIC. Second, some commentators suggest that the standard error of measurement (SEM) is a reasonable approximation to the MIC in some instruments.24 Using the SEM as a surrogate for the MIC in this epilepsy population is not supported because the MIC was considerable larger than the reported SEM for QOLIE-89 (6.6), QOLIE-31 (5.5), and HUI-III (0.11).10 Finally, it is important to consider how certain one can be that the relatively small changes corresponding to the MIC in various instruments represent real change as opposed to chance or measurement error. The upper 95% confidence interval of the MIC for QOLIE-89, QOLIE-31, and HUI-III approach the reported threshold values for 90% certainty that real change has occurred; and 95% certainty is achieved with medium change.10 We know of no such threshold values for SF-36 in epilepsy.
It remains to be determined whether patients with different types and severity of epilepsy differ systematically in their specification of MIC, or of small, medium, and large changes in HRQOL. For example, we have shown that patients with milder forms of epilepsy usually stipulate a higher MIC than those with intractable epilepsy.17 Because the MICs reported here are based on the minimum change that patients with difficult to control epilepsy consider worthwhile,17 they should be used with caution in those with milder forms of epilepsy. On the other hand, our estimates of small, medium, and large change denote empirically derived benchmarks that may not be directly dependent on any particular patient population, and may be applicable to a wide variety of patients with epilepsy. Finally, it is of interest that the type of treatment (surgery or anticonvulsants) had no impact on any of the estimates. This is in agreement with a previous analysis showing that the severity of epilepsy is a stronger determinant of MIC than the type of treatment.17
In summary, we ascertained the minimum clinically important change, and small, medium, and large changes in four HRQOL instruments for individual patients with difficult to control epilepsy. The validity of these estimates is supported by our analysis demonstrating applicability across a wide range of baseline HRQOL scores and different treatment modes, narrow 95% confidence intervals, and replication in a validation sample. The measures are most accurate for the epilepsy specific instruments QOLIE-89 and QOLIE-31, moderately accurate for the generic HUI-III, and least accurate for the generic tool SF-36. These estimates can assist clinicians and researchers in determining the magnitude of the effect of interventions on individual patients’ quality of life.
Example of a global rating question
We would like you to think about how epilepsy and its treatment are affecting your everyday life now as compared to 6 months ago when you entered the study.
Overall, in relation to your epilepsy and its treatment, would you say that your quality of life now is:
About the same
Patients who stated that they were worse, were asked to rate how much worse on the following scale:
−7 A very great deal worse
−6 A great deal worse
−5 A good deal worse
−4 Moderately worse
−3 Somewhat worse
−2 A little worse
−1 Almost the same, hardly worse at all
Patients who stated that they were better, were asked to rate how much better on the following scale:
+1 Almost the same, hardly better at all
+2 A little better
+3 Somewhat better
+4 Moderately better
+5 A good deal better
+6 A great deal better
+7 A very great deal better
Those who indicated that they were about the same were given a score of zero (0 = no change).
Supported in part by a research grant from the London Health Sciences Centre. Lorraine Foster-Janzen assisted with data collection and instrument scoring.