Article Text

Download PDFPDF

Measuring change in disability after inpatient rehabilitation: comparison of the responsiveness of the Barthel Index and the Functional Independence Measure
  1. J J M F van der Putten,
  2. J C Hobart,
  3. J A Freeman,
  4. A J Thompson
  1. Institute of Neurology, Queen Square, London WC1N 3BG, UK
  1. Professor AJ Thompson, Institute of Neurology, Queen Square, London WC1N 3BG, UK. Telephone 0044 171 837 3611 ext 4152; fax 0044 171 813 6505; email athompson{at}


BACKGROUND The importance of evaluating disability outcome measures is well recognised. The Functional Independence Measure (FIM) was developed to be a more comprehensive and “sensitive” measure of disability than the Barthel Index (BI). Although the FIM is widely used and has been shown to be reliable and valid, there is limited information about its responsiveness, particularly in comparison with the BI. This study compares the appropriateness and responsiveness of these two disability measures in patients with multiple sclerosis and stroke.

METHODS Patients with multiple sclerosis (n=201) and poststroke (n=82) patients undergoing inpatient neurorehabilitation were studied. Admission and discharge scores were generated for the BI and the three scales of the FIM (total, motor, and cognitive). Appropriateness of the measures to the study samples was determined by examining score distributions, floor and ceiling effects. Responsiveness was determined using an effect size calculation.

RESULTS The BI, FIM total, and FIM motor scales show good variability and have small floor and ceiling effects in the study samples. The FIM cognitive scale showed a notable ceiling effect in patients with multiple sclerosis. Comparable effect sizes were found for the BI, and two FIM scales (total and motor) in both patients with multiple sclerosis and stroke patients.

CONCLUSION All measures were appropriate to the study sample. The FIM cognitive scale, however, has limited usefulness as an outcome measure in progressive multiple sclerosis. The BI, FIM total, and FIM motor scales show similar responsiveness, suggesting that both the FIM total and FIM motor scales have no advantage over the BI in evaluating change.

  • Barthel Index
  • Functional Independence Measure
  • responsiveness
  • rehabilitation

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Measuring the effectiveness of clinical interventions by using standardised measurement instruments is now widely accepted as being central to good clinical practice.1 As the number of potential healthcare interventions has increased disproportionately to the healthcare budget, pressure has been put on services to show that they provide high quality care that is cost effective.

Measuring the outcome of healthcare interventions is a central component of determining therapeutic effectiveness and, therefore, the provision of evidence-based health care. However, information generated by outcome studies is only meaningful if the measures used are clinically useful and scientifically sound.2 Consequently, it must be shown that instruments measure the outcome under study in a way that is reliable and valid. In addition, instruments used for evaluative studies must also be shown to be able to detect clinically significant change in the outcome measured. This property is known as responsiveness.3

As rehabilitation is a labour intensive and costly intervention, evaluating its therapeutic effectiveness is particularly important. Although some studies have shown that rehabilitation is beneficial,4-9 there is no consensus as to which outcomes should be measured. Rehabilitation aims to improve various aspects of a patient’s life—for example, disability, handicap, and quality of life—and ideally these should all be included in the outcome assessment.10 Despite a move towards quality of life and patient based outcome measures, observer rated generic measures of disability are still widely used.11 The skills involved in self care and mobility are assumed to be basic to higher levels of functioning,12 thus improvements in disability are likely to have considerable impact on a person’s level of handicap and health related quality of life.

The Barthel Iindex (BI) and the FunctionaI Independence Measure (FIM) are probably the most widely used generic disability measures. The BI was developed in 1955 as a simple index of independence useful in scoring disability.13 However, it was regarded as being too crude, too simple, and not responsive enough to evaluate disability outcomes in rehabilitation. Consequently, the FIM was developed between 1984 and 1987. The specific aims of the developers of the FIM were to produce an instrument that provided comprehensive and “sensitive” disability measurement.14 The FIM contains more items than the BI, includes cognitive items, and has more response categories.

Although both instruments have evidence of reliability and validity,15-20 there is only limited information about their responsiveness. Wade et al have suggested that the responsiveness of the BI is adequate for clinical purposes but may be limited in the context of research.15Conceptually, it is argued that the FIM is more responsive than the BI.12 17 21 However, empirical data supporting this claim are limited, and those which exist use suboptimal methodology.18 Standard techniques for the assessment of responsiveness, such as the application of effect sizes, have not been used previously.

The aim of this study is to compare the appropriateness and responsiveness of the BI and the FIM in patients with multiple sclerosis and stroke patients receiving inpatient rehabilitation.



Patients with multiple sclerosis and stroke patients who were admitted to the neurorehabilitation unit (NRU) of the National Hospital for Neurology and Neurosurgery between 1994 and 1997 were studied. These diagnostic groups were studied because they have different clinical disease courses (acute v chronic). The NRU is an 18 bed unit specialising in intensive, individually tailored, goal oriented rehabilitation of patients with neurological disorders.22 Patients are selected for admission to the NRU if they have the physical potential to actively participate in an intensive rehabilitation programme; the cognitive ability to carry over learned skills into functional tasks; and require input from at least two disciplines other than medical and nursing staff. Patients were excluded from this study if their duration of stay was less than 7 days.


The BI is a 10 item instrument measuring disability in terms of a person’s level of functional independence in personal activities of daily living.16 It is rated from observation and has two items on a two point scale, six items on a three point scale, and two items on a four point scale. Item scores are summed to generate a total score (0=minimum independence; 20=maximum independence). The BI is user friendly and multiple studies support its reliability and validity.15 16 23

The FIM is an 18 item instrument measuring a person’s level of disability in terms of burden of care.14 It was developed specifically to measure functional outcomes of rehabilitation.20 The developers recommend that the FIM is rated from patient observation by the consensus opinion of a multidisciplinary team. Each item is rated from 1 (requiring total assistance) to 7 (completely independent). Three independent FIM scores can be generated by summing item scores: a total score (FIM total: 18 items), a motor score (FIM motor: 13 items), and a cognitive score (FIM cognitive: 5 items). Multiple studies support the reliability and validity of FIM scales.18-21 24 25

The expanded disability status scale (EDSS) is a multiple sclerosis specific, neurologist rated, index grading disease severity from 0 (normal neurological examination) to 10 (death due to multiple sclerosis) in 20 steps.26 Rating is based on the medical history and the neurological examination. Although the EDSS has been heavily criticised,27-29 it remains the most widely used measure for multiple sclerosis due to the absence of well evaluated superior alternatives. Evidence supports the reliability and validity of the EDSS.30


Patients referred for neurorehabilitation were assessed by a senior multidisciplinary team consisting of a neurologist, clinical nurse specialist, occupational therapist, and physiotherapist. Patients whom it was considered would benefit from in patient neurorehabilitation had an admission date booked. On admission to the NRU patient characteristics were recorded along with disease severity (EDSS) in the multiple sclerosis group. For all patients, disability measures (BI and FIM) were rated within 96 hours of admission to, and within 48 hours of discharge from the NRU by consensus opinion of a treating multidisciplinary team.



Appropriateness attempts to define whether the range of disabilities in a study sample is similar to the range of disabilities covered by an instrument. In this study appropriateness was assessed by examining score ranges, means, SDs, and floor and ceiling effects for the BI and three FIM scales. Mean scores indicate the central tendency of the group, ideally these should lie near the midpoint of the scale range. Sample range and SD indicate the extent to which an instrument demonstrates variability in the study sample. The greater the variability detected the better an instrument discriminates between subjects. Floor and ceiling effects, calculated as the percentage of the sample scoring the minimum and maximum possible scores respectively, indicate the extent that scores cluster at the bottom and top of the scale range. Floor and ceiling effects represent a limited ability to discriminate between subjects. When an instrument measures a restricted range of health status floor and ceiling effects indicate that the range of disability measured by the scale is less than the range of disability occurring in the study sample. Floor and ceiling effects exceeding 20% are considered to be significant.31


Responsiveness is defined as the ability of a measure to detect clinically important change in the outcome of interest.32In this study responsiveness was determined using an effect size calculation, defined as mean change score (discharge minus admission) divided by the SD of admission (pretreatment) scores.33Effect sizes indicate, in SD units, the magnitude of change undergone by an instrument between two points in time. Therefore, the greater the effect size the greater the responsiveness of an instrument. By relating change scores to the variability of the study sample, effect sizes transform raw change scores with limited meaning to a standard metric thereby allowing comparison of different instruments and different samples. When instruments are compared in the same sample a direct indication of their relative responsiveness is provided. Under these circumstances the instrument with the largest effect size is considered the most responsive.33 In addition, pairedt tests were used to determine the statistical significance of disability change scores.



Table 1 presents the characteristics of the 283 patients studied. The multiple sclerosis group (71% of sample) contained more women, was slightly younger, and had a shorter length of stay than the stroke group (29% of sample). The EDSS scores indicated that the multiple sclerosis group were moderate to severely disabled.

Table 1

Characteristics of patients with multiple sclerosis and stroke patients


Table 2 presents BI and FIM score distributions for patients with multiple sclerosis and stroke patients on admission to the NRU. For both disease groups, patient scores on the BI, FIM total, and FIM motor scales spanned the entire scale range, had mean scores near the scale midpoint, and had small floor and ceiling effects. These results indicate that these three scales are appropriate to the study samples. However, the results shown in table 2 raise concerns over the appropriateness of the FIM cognitive scale as a measure of cognitive disability in the patients with multiple sclerosis studied. Actual scores only span the upper (less disabled) range of the scale, the mean score is well above the scale midpoint, the SD is small, and the ceiling effect is only just below the recommended upper limit. The FIM cognitive scale is, however, more appropriate to the stroke than the multiple sclerosis sample.

Table 2

BI and FIM scores on admission: sample range, mean, floor, and ceiling effect in two disease groups


Table 3 presents disability change scores with their statistical significance and effect sizes for the BI and three FIM scales in patients with multiple sclerosis and stroke patients. Change scores for all scales in both disease groups were positive, indicating less disability on discharge than admission. These change scores were statistically significant (p<0.0001) except for the FIM cognitive score in the multiple sclerosis group.

Table 3

Comparison of BI and FIM change scores, p values, and effect sizes

Effect sizes for the BI, FIM total, and FIM motor scales were very similar in each disease group indicating comparable responsiveness for these three scales. Also, in both disease groups effect sizes for the FIM cognitive scale were much less than for the BI, FIM total, and FIM motor scales indicating that the FIM cognitive scale is the least responsive scale.


In this study the appropriateness and responsiveness of the BI and FIM were compared in patients with multiple sclerosis and stroke patients receiving inpatient neurorehabilitation. The results show that all measures were appropriate to the samples studied although the FIM cognitive scale has a notable ceiling effect in patients with multiple sclerosis. More importantly, the BI, FIM total, and FIM motor scales show similar responsiveness in both disease groups.

Appropriateness of disability measures, as defined in this study, is rarely reported in clinical studies. However, when scales measure a restricted range of health status it is important to show the appropriateness of this range to the study sample. The patients in this study had moderate to severe disability as measured by the EDSS. However, the range of cognitive dysfunction measured by the FIM cognitive scale was restricted in these patients suggesting that this scale has limited usefulness for the measurement of cognitive disability in patients with multiple sclerosis undergoing neurorehabilitation. Even for the stroke patients the ceiling effect of the FIM cognitive scale is notable (13.4%) raising some concerns over its use in this patient group.

Patients in this study are not necessarily representative of multiple sclerosis or stroke patients undergoing neurorehabilitation. They were not randomly selected, and severely cognitively impaired patients were not represented as reasonable cognition was one of the selection criteria for admission to the unit. These considerations underlie the need to examine the appropriateness of scales to a study sample.

The most important finding of this study is the demonstration that the BI, FIM total, and FIM motor scales have similar responsiveness. This is perhaps surprising as the FIM was developed specifically to be more “sensitive to change” (responsive) than the BI,12 14 17 and has more items and more response categories. The findings of this study suggest that the FIM has no advantages over the BI in evaluating changes in disability due to therapeutic interventions. This has important clinical implications as the BI is quicker and simpler to rate. In addition, it can be rated by any healthcare professional whereas the developers of the FIM recommend rating by consensus opinion of a multidisciplinary team after a period (up to 72 hours) of patient observation. Furthermore, the BI can be administered by self report, adding to its impact on the design and cost of clinical studies.34 35

Examining relative responsiveness is important as it helps clinicians to choose between competing disability measures on an empirical basis. The more responsive a disability measure, the more useful it is for evaluative studies as the importance of responsiveness lies in the trade off between sample size and statistical power.36 For a given sample size, using a more responsive instrument increases the possibility of detecting a statistically significant result. Similarly, for a given statistical power a smaller sample size can be used if a more responsive instrument is employed.

There is no consensus as to which of the many methods of reporting responsiveness should be used. The effect size statistic used in this study is widely used and recommended.37 However, different studies often use different statistical methods, thereby complicating comparative data interpretation. Furthermore, the responsiveness of instruments seems to be disease dependent. In this study effect sizes for BI and all FIM scales are greater for stroke patients than for patients with multiple sclerosis suggesting that these instruments are more responsive in stroke patients. Consequently, examining the responsiveness of competing instruments in the same samples undergoing the same interventions provides the best indication of relative responsiveness.

In conclusion, these results show that the BI, FIM total, and FIM motor scales have a similar ability to detect change in disability in a selected sample of multiple sclerosis and stroke patients undergoing neurorehabilitation. All measures were shown to be very appropriate to the study sample, although concerns are raised about using the FIM cognitive scale in patients with multiple sclerosis.


We thank medical, nursing and all therapy staff at the NRU for their involvement.