Statistics from Altmetric.com
In response to the recently published paper by Liuet al,1 we wish to initiate further discussion regarding the proposed area under the curve (AUC) summary measure and comment on some inherent difficulties with this method.
In the paper, Liu et al note that when constructing the summary AUC, the “essential components are the sets of serial impairment or disability data, preferably with frequent sampling points for each patient...” This is in fact a crucial point and depending on the schedule of observations, the observed plot of the expanded disability status scale (EDSS) over time can look quite different from the true curve. For example, consider figure D that has been reproduced below from the paper (figure A). This plot of EDSS over time represents a continuum, which in reality is unlikely to be fully observed as patients are not seen on a daily or weekly basis throughout a study because it is not practical to do so. Instead, patients are scheduled to return for assessment at certain times during the study. If, hypothetically, this patient was seen every 3 months by the study investigator, the observed plot of EDSS over time would appear as figure B. However, if this patient was seen every 6 months during the study instead, the observed plot of EDSS over time would appear as figure C. These two “observed” EDSS curves look different, even though they represent the same underlying curve. It is obvious that the resulting AUC for each plot would also differ in magnitude. If patients could be measured daily to create a smooth, accurate curve, this would not be an issue. In practice, the resulting EDSS curve over time is spiky and uneven and so the AUC measurement is greatly impacted.
It is also necessary to clarify what the proposed AUC summary measure is actually measuring and how it can be interpreted. The interpretation can vary depending on such factors as how baseline values were handled in the calculation of the AUC, whether unscheduled visits were included, and which summary statistics are reported. For example, if scores are “normalised to baseline” as described in the article, patients with completely different baseline EDSS scores can have the same AUC, yet the degree of disability will be greater for the patient with the higher baseline EDSS. From a clinical perspective, the question should be raised, “Do we want to consider the disability of these patients to be the same by using the AUC summary measure?” Likewise, as the article points out, “Caution is necessary in short trials of 2 or 3 years, as fixed neurological deficits are accumulating very slowly, and an increased AUC at the end of a trial may simply represent transient disability which has either resolved or has yet to resolve.” This implies that the AUC summary measure may not be a good indication of irreversible clinical deterioration. The AUC measure may reflect exacerbations rather than sustained disability.
The concept of AUC has been used extensively in other fields with great success. Most commonly, it has been used when measuring either peaked data (outcome variable starts from a baseline, rises to a peak, and then returns to baseline) or growth data (outcome variable steadily increases or decreases with time and does not start to return to its initial value over the period of the study).2 Even then, however, AUC is not used in isolation. For example, when used in pharmacokinetic modelling of blood concentration data, the maximum concentration and the time to maximum concentration are also reported. This is because the AUC alone cannot summarise theshape of the curve. We think that irreversible disability progression in multiple sclerosis must continue to be measured by time to event and intrapatient changes in disability. Improvements in assessment of disability are more likely to come from outcome measures such as the multiple sclerosis functional composite3 that overcome issues with the EDSS such as non-linearity.
Liu et al reply:
We welcome the opportunity to discuss the role of the AUC (area under the plotted curve of disability against time) as a summary measure statistic in treatment trials of multiple sclerosis, although many of the points raised by Simonian and her colleagues simply reiterate those we made in our paper.1-1
The first comment considers the impact of the number of points on the shape of the disability curve. We agree that the sampling frequency will alter the shape of the curve and this is precisely why the AUC method is preferable to the conventional approach, which emphasises single or two point assessments. By taking account of data from all the assessment points, the bias highlighted in figures B and C by Simonianet al would have a greater chance of averaging out. Obviously, the greater the sampling frequency, the better the approximation to the disability actually experienced (figure A). For pragmatic reasons, in practice, the number of assessments are limited. Trials with a scheduled visit frequency of only 6 months1-2 1-3 will necessarily be less accurate in following actual in trial disability than those with higher rates of assessment1-4-1-6 whatever the clinical rating scale tool used. Our approach takes account of this fact.
On the important question of clinical interpretation of the AUC summary measure, we reiterate our argument that the AUC provides an index of in trial morbidity, or as we called it, “total disability experience” (summed transient and irreversible disability). This is clinically meaningful and relevant in short studies involving relapsing-remitting multiple sclerosis, in which many disability changes which impact patients’ daily lives, remit before the end of a trial.
The problem of summating estimated disability changes at different levels of clinical rating scales as raised by Simonian and colleagues, is a different point altogether and hardly limited to an analysis by the AUC statistic. If data on changes at different levels of a rating scale are required, then stratification analysis according to baseline disability1-7 can be carried out using any outcome measure including “confirmed progression”, overall EDSS change, or AUC.
Simonian and colleagues consider that AUC is appropriate for “growth data”, which in terms of multiple sclerosis would apply to patients deteriorating steadily with chronic progressive disease. In fact, as stated by Matthews et al,1-8 the AUC statistic is particularly relevant for summarisingpeak data such as the increase and decrease of disability associated with a relapse. However, it is also relevant and appropriate for the analysis of the complex mixture of peaked, multipeaked, and growth data, which characterises the disability course of relapsing-remitting multiple sclerosis. If the functional form of the disability progression were known, we would of course be able to use parametric modelling (for example, exponential decay or linear decay). Our analysis is non-parametric and the only assumptions we make are that the events are stochastic and the measurements are serially correlated.
We did not imply that the AUC statistic “should be used in isolation” any more than a mean or median EDSS should be the only summary measure. The AUC clearly does not provide information about the direction of disability change and we pointed out that additional analyses would be necessary to determine time trends. Whereas we agree that theoretically, irreversible disability progression in multiple sclerosis is best assessed by time to events, in practice outcomes such as EDSS of 6.0, are not truly irreversible and subject to substantial measurement error. Event history analysis is more appropriate for truly irreversible events such as death or loss of virginity (!) rather than those we typically use in relapsing-remitting multiple sclerosis. Commonly employed disability end points, such as EDSS change from study entry to completion, and the so-called confirmed progression at 3 or 6 months, have major flaws as already discussed in our paper.
The relative merits of clinical rating scales such as the EDSS, Scripps neurologic rating scale, or the proposed multiple sclerosis functional composite should not be confused with the statistical methods used to analyse serial scores acquired from these scales. Whether the functional composite turns out to be better or worse than established scales remains to be seen, but this does not in any way impinge on the desirability of applying AUC analysis to serial data derived from any clinical rating scale (including the functional composite). We think that because this summary measure statistic (AUC) takes into account both transient and permanent disability, as well as the magnitude and duration of disability changes, and because it is simple to apply, sensitive and variance stabilising through its incorporation of all serial time points, it is both appropriate and clinically meaningful for outcome analysis of treatment trials of relapsing-remitting multiple sclerosis.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.