Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
The choice of the most appropriate primary and secondary outcome measures is often the most complex issue in the design of a randomised controlled clinical trial. It has implications for the cost of the trial, the sample size, the burden that the trial will place on patients and clinicians taking part, and the likelihood that the result of the trial will influence clinical practice. As illustrated by Freeman et al 1 in the previous issue of this Journal (February pp 150–6), whichever outcome is chosen it is important that it has been properly validated in a representative sample of patients with the disease under study. They assessed whether the short form 36 (SF-36), the most often used generic measure of health status, has the properties necessary to detect clinically significant change in the type of patients included in treatment trials in multiple sclerosis. They found that it was clinically appropriate, had acceptable convergent validity and discriminant construct validity, and had good internal reliability. However, they report important floor and ceiling effects in certain domains of the measure, and a lack of responsiveness to purported clinical improvements after hospital admission for rehabilitation in a group of relatively severely disabled patients. This last finding may have been due to floor effects in severely disabled patients, and may not be applicable to more typical trial patients, but it would nevertheless be unwise to rely solely on the SF-36 to measure the effectiveness of disease modifying treatment in multiple sclerosis.
What should be done when a clinical outcome measure seems to lack responsiveness to useful clinical improvement in the disease of interest? Firstly, it is important to resist the temptation to revert to a surrogate outcome measure that reflects progression of the pathology of the disease rather than the clinical burden. Surrogate outcomes can be very sensitive measures of the biological effects of treatments, but very poor indices of the clinical effect. For example, Campath-1 almost completely halts the formation of new lesions on MRI brain imaging in multiple sclerosis, but it has no significant effect on clinical progression of the disease.2 Routine use of antiarrhythmic drugs after myocardial infarction substantially reduces the frequency of ventricular ectopics on 24 hour ECG monitoring, but dramatically increases mortality,3 and sodium fluoride produces significant increases in bone density in women with osteoporosis, but it dramatically increases the risk of major fractures.4 In each of these examples, the surrogate outcome had been very highly correlated with the relevant clinical outcome in observational studies. However, it does not follow that simply because a surrogate outcome happens to be predictive of a clinical outcome in cross sectional or cohort studies, it will necessarily respond in the same way to treatment. The only way to validate a surrogate outcome measure is to show that itsresponse to treatment is predictive of the effect of treatment on important clinical outcomes.
Secondly, it is important not to assume that a more detailed or complex clinical measure will necessarily be more responsive than a simple measure. There is no intuitive statistical relation between sensitivity to change and the complexity of a clinical measurement. Indeed, more complex measures may be less sensitive because of increased interobserver and intraobserver error. Simple handicap scales or even a few simple questions can be highly discriminating measures of clinical outcome.5 Simple measures of outcome also have the advantage that, unlike the change in the mean value of a complicated neurological impairment score, they have obvious meaning for the patient and clinician.
Finally, if no treatment effect can be detected despite using a clinically appropriate outcome measure that has been shown to be valid and reproducible, and to be free of major floor and ceiling effects, the most likely explanation is that the treatment does not actually work. In other words, it is the treatment, and not the responsiveness of the outcome measure, that is inadequate.