Article Text

Download PDFPDF

Original research
Measurement properties of the Inclusion Body Myositis Functional Rating Scale
  1. Sharfaraz Salam1,
  2. Tara Symonds2,
  3. Helen Doll2,
  4. Sam Rousell2,
  5. Jason Randall2,
  6. Lucy Lloyd-Price2,
  7. Stacie Hudgens3,
  8. Christina Guldberg4,
  9. Laura Herbelin5,
  10. Richard J Barohn5,
  11. Michael G Hanna1,
  12. Mazen M Dimachkie6,
  13. Pedro M Machado1,7
  14. On behalf of Members of the Arimoclomol in IBM Investigator Team of the Neuromuscular Study Group
  1. 1Department of Neuromuscular Diseases, University College London, London, UK
  2. 2Clinical Outcomes Solutions Ltd, Folkestone, UK
  3. 3Clinical Outcomes Solutions Ltd, Tucson, Arizona, USA
  4. 4Orphazyme Aps, Copenhagen, Denmark
  5. 5Department of Neurology, University of Missouri, Columbia, Missouri, USA
  6. 6Department of Neurology, University of Kansas City Medical Center, Kansas City, Missouri, USA
  7. 7NIHR University College London Hospitals Biomedical Research Centre, University College London Hospitals National Health Service (NHS) Trust, London, UK
  1. Correspondence to Professor Pedro M Machado; p.machado{at}ucl.ac.uk

Abstract

Objectives To evaluate the validity, reliability, responsiveness and meaningful change threshold of the Inclusion Body Myositis (IBM) Functional Rating Scale (FRS).

Methods Data from a large 20-month multicentre, randomised, double-blind, placebo-controlled trial in IBM were used. Convergent validity was tested using Spearman correlation with other health outcomes. Discriminant (known groups) validity was assessed using standardised effect sizes (SES). Internal consistency was tested using Cronbach’s alpha. Intrarater reliability in stable patients and equivalence of face-to-face and telephone administration were tested using intraclass correlation coefficients (ICCs) and Bland-Altman plots. Responsiveness was assessed using standardised response mean (SRM). A receiver operator characteristic (ROC) curve anchor-based approach was used to determine clinically meaningful IBMFRS change.

Results Among the 150 patients, mean (SD) IBMFRS total score was 27.4 (4.6). Convergent validity was supported by medium to large correlations (rs modulus: 0.42–0.79) and discriminant validity by moderate to large group differences (SES=0.51–1.59). Internal consistency was adequate (overall Cronbach’s alpha: 0.79). Test–retest reliability (ICCs=0.84–0.87) and reliability of telephone versus face-to-face administration (ICCs=0.93–0.95) were excellent, with Bland-Altman plots showing good agreement. Responsiveness in the worsened group defined by various external constructs was large at both 12 (SRM=−0.76 to −1.49) and 20 months (SRM=−1.12 to −1.57). In ROC curve analysis, a drop in at least two IBMFRS total score points was shown to represent a meaningful decline.

Conclusions When administered by trained raters, the IBMFRS is a reliable, valid and responsive tool that can be used to evaluate the impact of IBM and its treatment on physical function, with a 2-point reduction representing meaningful decline.

Trial registration number NCT02753530.

  • INCL BODY MYOSITIS
  • MUSCLE DISEASE
  • NEUROMUSCULAR
  • RANDOMISED TRIALS
  • RHEUMATOLOGY

Data availability statement

Data are available on reasonable request. Data sharing requests can be submitted after 1 year following publication of the main study results, to the corresponding authors, who will provide a data access request form. Data sharing requests will be considered by the Trial Steering Committee on a case-by-case basis, and data will be shared if the request is considered reasonable, of scientific interest, and legally and ethically possible.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • The Inclusion Body Myositis Functional Rating Scale (IBMFRS) is a clinician-reported outcome measure to assess the functional status of inclusion body myositis (IBM) patients. Despite being used both clinically and in the context of research, there is a paucity of literature on its psychometric properties.

WHAT THIS STUDY ADDS

  • The IBMFRS is a reliable, valid and responsive tool that can be used to evaluate the impact of IBM and future treatments. Furthermore, a 2-point reduction in the IBMFRS total score indicates a clinically meaningful decline in function.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • The IBMFRS can be used as a robust outcome measure for IBM patients in clinical trials and identify patients demonstrating a significant clinical decline. This information is valuable for a broad spectrum of stakeholders, including patients, clinicians, researchers, pharmaceutical companies and regulators.

Introduction

Inclusion body myositis (IBM) belongs to the idiopathic inflammatory myopathies class of muscle disease. It is associated with ageing and characterised by early involvement of the long finger flexors and quadriceps muscles; swallowing and respiratory function can also be affected.1–3 IBM is a progressive and debilitating muscle-wasting disease, with an increased risk of death from complications such as aspiration pneumonia and dysphagia.4 Although a variety of drug trials have been completed in the last decade, IBM currently has no licensed treatment.5–8

Clinical outcome assessments (COAs) are crucial for measuring disease progression and the severity of IBM. Valid, reliable and responsive COAs are imperative in clinical trials to gauge the response to potential treatments.9–12 The IBM Functional Rating Scale (IBMFRS), established in 2008 as a disease-specific clinician-reported outcome measure (ClinRO), was adapted from the Amyotrophic Lateral Sclerosis Functional Rating Scale.

In a previous study, the content validity of the IBMFRS was confirmed along with the reliability of the measure.13 Both patients and physicians agreed that the measure adequately captured core functional impacts of IBM. The study found good inter-rater reliability for face-to-face (F2F) and video ratings, excellent intrarater reliability for both modes, and excellent equivalence between F2F and phone administration.

The IBMFRS has gained popularity as an outcome measure in recent IBM clinical trials,14–17 and it was used as primary endpoint in the recent large, multicentre, randomised, double-blind, placebo-controlled (RDBPC) trial of arimoclomol in IBM.8 Additionally, the IBMFRS has been selected as the primary endpoint for two ongoing IBM RDBPC trials: one involving rapamycin/sirolimus (NCT04789070) and the other employing an anti-KLRG1 antibody (ABC008/Ulviprubart, NCT05721573). To date, however, the measurement properties of the IBMFRS have not been thoroughly investigated, particularly in large datasets.

Our aim was to gather information on the measurement properties of the IBMFRS, namely validity, reliability, responsiveness and interpretability (meaningful within-person change threshold), in patients recruited to the arimoclomol in IBM trial.8

Methods

Study design and population

The arimoclomol trial was an RDBPC trial conducted at specialist neuromuscular centres (NCT02753530). Eligible participants diagnosed with IBM, meeting any category of the European Neuromuscular Centre research diagnostic criteria 2011,18 had to demonstrate the ability to rise from a chair unaided and walk at least 6 m. The study spanned 20 months, featuring both in-person and remote visits, with the trial schedule and details having previously been published.8 19 Patients enrolled on the arimoclomol clinical trial were broadly representative of those from other clinical trials in IBM, including ongoing efficacy clinical trials in IBM (NCT04789070, NCT05721573). Participants gave informed consent to participate in the study before taking part.

Clinical outcome assessments

Inclusion Body Myositis Functional Rating Scale

The IBMFRS is a ClinRO measure used to determine participants’ capability and independence in 10 functional activities.13 14 16 20 Each of the 10 items (swallowing, handwriting, cutting food and handling utensils, fine motor tasks, dressing, hygiene, turning in bed and adjusting covers, sit to stand, walking, climbing stairs) are graded on a 5-point ordinal scale from 0 (unable to perform) to 4 (normal). The sum of the 10 items gives a value between 0 and 40, with higher scores representing less functional limitation (ie, better health outcome). IBMFRS raters received initial training and certification before commencing the study, with mandatory yearly training and recertification thereafter. Also, raters were provided with a written procedure on how to apply the scale. Sites were advised to consistently employ the same evaluator for IBMFRS administration at each visit.

Patient-reported outcome measures

A Patient Global Impression of Severity (PGIS) was included to measure the impact of the disease. The PGIS asked, ‘Considering all aspects of your IBM and its impact on your day-to-day activities (eg, dressing, walking, bathing) right now, would you say that the impact is currently…’ and was scored from 0 to 5 (none to very severe): 0=none, 1=very mild, 2=mild, 3=moderate, 4=severe and 5=very severe.

A Patient Global Impression of Change (PGIC) was included to assess self-perceived change in the ability to conduct daily activities since the start of the study medication. The PGIC was scored as follows: 0=very much worse, 1=much worse, 2=a little worse, 3=no change, 4=a little improved, 5=much improved and 6=very much improved.

The Short Form 36-Item Survey (SF-36) measures health-related quality of life and was scored in accordance with existing guidelines for the instrument.21 Scores range from 0 to 100, with higher scores representing better health status. The SF-36-Physical Functioning (SF36-PF) and the SF-36 Physical Component Summary (SF36-PCS) scores were used in our analyses.

The Health Assessment Questionnaire-Disability Index (HAQ-DI), a self-reported measure, was included to assess the level of functional ability; questions can be grouped into eight categories of functioning: dressing and grooming, rising, eating, walking, hygiene, reach, grip and usual activities. The score ranges from 0 to 3, with higher scores representing more disability.22

Performance outcome (PerfO) measures

Patients were assessed with the 6 min walk test (6MWT) and modified timed up and go (mTUG).23 Hand grip strength was tested with a Jamar Dynamometer; the maximum result (in kg) for the strongest hand (as determined at baseline) was used in the analyses. Manual muscle testing (MMT) was used to assess the strength of 24 different muscles; the scores were converted to numerical values from 0 to 10 before a total score was calculated as an average across the 24 muscles.

Statistical analyses

Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) recommendations were followed to test and report measurement properties.24 25 Descriptive statistics were used to characterise the sample. All analyses were performed by using SAS V.9.4. When applicable, the statistical significance tests were two sided, with a threshold of p<0.05.

Construct validity

Convergent validity is the degree to which the domains of a COA tool are associated with those of another tool known to measure the same construct. Convergent validity was assessed at baseline using Spearman correlations calculated between the IBMFRS total score and the HAQ-DI, SF36-PF, SF36-PCS, mTUG, 6MWT, hand grip strength and MMT. Correlations were considered weak if the resulting coefficient was <0.3; moderate if between 0.3 and <0.5; and strong if ≥0.5.26

Known groups (discriminant) validity is the ability of an instrument to discriminate between groups of individuals known to differ in terms of the construct of relevance, that is, between clinically distinct groups hypothesised a priori. Known group validity was assessed at baseline to determine whether the IBMFRS differed between groups based on the HAQ-DI score and the PGIS. Groups of HAQ-DI scores were created to indicate mild (score 0 to 1), moderate (score >1 to 2) and severe disability (score >2 to 3).22 Four PGIS categories were considered: very mild (category 1), mild (category 2), severe (category 3) and very severe (category 4); there were no subjects at baseline that scored either ‘none’ (category 0) or ‘very severe’ (category 5). Known group validity was assessed using analysis of covariance (ANCOVA) adjusted for gender, age and race; least squares (LS) means and SEs were derived from the ANCOVA model and two-sided p values for the difference in means between adjacent groups were determined. In addition, standardised effect sizes (SES) were calculated by dividing the difference in scores between consecutive groups by the pooled group SD. Cohen’s guidance was used to interpret the magnitude of SES: small (0.2 to <0.5), medium (0.5 to <0.8) and large difference (≥0.8).26

Reliability

Medical and health-related decisions by patients and clinicians rest on the assumption that differences among and within patients exist and have important implications. Survey instruments, therefore, are useful only to the degree to which they reliably and accurately reflect true psychological or health-related differences. Reliability reflects the extent to which differences in patients’ observed scores are consistent with differences in their true scores as opposed to measurement error.

Cronbach’s alpha statistic at baseline was used to assess the internal consistency reliability of the IBMFRS. An overall alpha value was calculated together with item-level alpha values showing the change in alpha following the exclusion of each item in turn. Alpha values ≥0.70 are considered adequate. Alpha values >0.90 may indicate an overly homogeneous measure, or where the measure contains too many items, where items are redundant due to excessive similarity.

Intrarater reliability was assessed between adjacent periods at which it was possible to determine stability, when the PGIS was also administered. The PGIS was administered every 4 months, and intrarater reliability was measured at these time points in stable subjects who reported no change in the PGIS. Three periods were defined for identifying stable patients: between baseline and month 4, between month 4 and month 8, and between month 8 and month 12.

Equivalence of telephone versus F2F in-clinic administration of the IBMFRS was measured in subjects who reported no change in the PGIS (ie, stable subjects) between months 8 and 12. Most assessments at these time points were in clinic, with the assessment at month 10 being by telephone. Patients who did not change according to the PGIS scores at month 12 vs month 8 were assumed to have not changed at month 10 also.

Agreement was assessed using intraclass correlation coefficients (ICCs) with 95% CIs. The ICCs were calculated using the Shrout-Fleiss reliability formula for calculating absolute agreement on a single (domain) score based on a two-way mixed-effect ANOVA with a factor corresponding to modality (F2F vs telephone) and another related to patient. An ICC≥0.70 or greater is considered desirable, with an ICC≥0.80 considered to indicate excellent reliability. Agreement across the scale of the IBMFRS was also visualised by Bland-Altman plots. Due to the increased frequency of COVID-19-induced telephone visits after month 12, reliability was not measured after this time point.

Responsiveness

Responsiveness refers to the ability of an assessment to detect change where change exists. In the longitudinal hypothesis testing analysis, Spearman’s correlation coefficients were used to assess the degree of association between change on the IBMFRS and change on the reference measures. Correlations were considered weak if the resulting coefficient was <0.3; moderate if between 0.3 and <0.5; and strong if ≥0.5.26

In the magnitude of change analysis, PGIS, HAQ-DI, SF36-FCS, mTUG and MMT were used to stratify IBM patients into change groups according to the corresponding score. Paired t-tests were used to evaluate the within-group differences in IBMFRS change scores between two time points, and the standardised response mean (SRM) was calculated by dividing the mean change score between the baseline and the subsequent time point by the SD of the change score. The magnitude of SRM was interpreted based on Cohen’s recommendations as outlined above.26

Meaningful change threshold

The purpose of an anchor measure is to identify change in the COA measures that represent the patient perceived improvement or deterioration. Two single-item measures (PGIS, PGIC) were selected to be evaluated as possible anchors. Both measures satisfy the recommendation that anchors should be less complex to interpret than the endpoint they are used to assess.

An anchor-based approach using PGIS and PGIS tools as external constructs was used to help determine meaningful change in the IBMFRS. ROC analyses were used to determine the optimal threshold for meaningful decline in the IBMFRS total score, referred to as the best cut point, that is, the IBMFRS total score deterioration (change) that best discriminates between predefined binary outcomes on the PGIS and PGIC anchors.27 28 PGIC binary scoring was defined as (1) ‘worsened’, comprising: ‘very much worse’, ‘much worse’ or a ‘little worse’; versus (0) ‘no change or improved’, comprising: ‘no change’, ‘a little improved,’ ‘much improved’ or ‘very much improved’. PGIS binary scoring was defined as (1) ‘worsened’, comprising ≥1 category worsening within the scale; versus (0) ‘no change or improved’, comprising no change in PGIS category or ≥1 category improvement within the scale. The distance to the (0, 1) point of the ROC curve and the Youden’s index were used to define the best cut points, as these are the two methods that provide the best balance between sensitivity and specificity.29–31

Results

Study population

The total number of patients analysed at baseline was 150. Participant’s age ranged from 48 and 89, with a mean age of 67.2 (SD 8.1) (table 1). There was a male preponderance (114 (76%)) in the population and most of the participants were of white race (143 (95.3%)). Mean IBMFRS total score was 27.4 (SD 4.6), reflecting overall moderate disability.

Table 1

Baseline clinical and demographic characteristics (N=150)

Construct validity

Convergent validity

The correlations between the HAQ-DI (r=−0.79), SF36-PF (r=0.53) and SF36-PCS (r=0.46) were medium to large, supporting convergent validity of the IBMFRS, with the HAQ-DI having the largest correlation. PerfO assessments, namely MMT total score (r=0.58), mTUG (r=0.64), 6MWT (r=0.62), and, to a lesser extent, Hand Grip Strength (r=0.42) showed moderate to strong correlations with the IBMFRS, again supporting convergent validity of the IBMFRS.

Known group validity

Data to support known group (discriminant) validity are presented in table 2. IBMFRS scores decrease progressively the greater the severity as indicated by the PGIS and HAQ-DI. For patients categorised as mild versus very mild (SES=0.73, p=0.158) and moderate versus mild (SES=0.51, p=0.028) on the baseline PGIS scores, the SES values indicated a moderate difference versus the adjacent (lower) category. A large difference was noted for those patients with a score classified as severe versus moderate, with an SES of 0.98 (p=0.001). For patients categorised as moderate versus mild (SES=1.59) and severe versus moderate (SES=1.32) on the HAQ-DI (p<0.001 in both groups), large SES differences were observed.

Table 2

Known-groups validity of the IBMFRS versus the PGIS and the HAQ-DI reference measures at baseline

Reliability

Internal consistency

The overall Cronbach’s alpha coefficient was 0.79, with the coefficient after exclusion of each of the 10 items ranging from 0.75 to 0.81, which supports an adequate consistency of the IBMFRS (table 3). The exclusion of swallowing resulted in the greatest increase in consistency, with an alpha coefficient of 0.81.

Table 3

Internal consistency for the Inclusion Body Myositis Functional Rating Scale at baseline (N=150)

Intrarater reliability

IBMFRS total scores taken at all the three defined periods achieved ICCs>0.80 and supported strong intrarater reliability of the IBMFRS: stable patients between baseline and month 4 (n=78), ICC=0.84 (95% CI 0.77 to 0.90); between months 4–8 (n=77), ICC=0.85 (95% CI 0.77 to 0.90) and between months 8 and 12 (n=78), ICC=0.87 (95% CI 0.80 to 0.91). In addition, Bland-Altman plots showed a good agreement between IBMFRS total scores at first and second assessments in stable patients (figure 1).

Figure 1

.Test–retest reliability of the Inclusion Body Myositis Functional Rating Scale (IBMFRS). Bland-Altman plot showing degree of agreement of the IBMFRS from (A) baseline to month 4 (N=78), (B) month 4 to month 8 (N=77) and (C) month 8 to month 12 (N=78).

Equivalence of telephone versus F2F in-clinic administration

In stable patients, the ICCs for equivalence were notably high at 0.95 (95% CI 0.92 to 0.97) when comparing the in-clinic administration of the IBMFRS at 8 months vs over-the-telephone administration at 10 months. Similarly, a high ICC of 0.93 (95% CI 0.89 to 0.96) was observed when comparing the in-clinic administration at 12 months vs over-the-telephone administration at 10 months. Bland-Altman plots showed a good agreement between IBMFRS total scores at consecutive in-clinic and over-the-telephone assessments in stable patients (figure 2).

Figure 2

.Equivalence of Inclusion Body Myositis Functional Rating Scale (IBMFRS) scoring in clinic versus via over the phone. Bland-Altman plot showing degree of agreement of the IBMFRS from (A) month 8 (in clinic) to month 10 (phone) (N=79), and from month 10 (phone) to month 12 (in clinic) (N=79).

Responsiveness

Longitudinal hypothesis testing

The correlations between IBMFRS change scores and change scores for different COAs calculated at months 12 and 20 are presented in table 4. The most robust associations with IBMFRS change were observed with HAQ-DI change at month 12 and month 20, with corresponding coefficients of −0.50 and −0.54, respectively, followed by the mTUG change (coefficients of 0.36 and 0.41 at month 12 and month 20, respectively). Change score correlations with other COAs were weak to moderate (table 4).

Table 4

Correlation between change in the IBMFRS and change in reference measures from baseline to months 12 and 20

Magnitude of change

Change in the IBMFRS by degree of change in the PGIS, HAQ-DI, SF36-PCS, mTUG and MMT, between baseline and months 12 and 20, is presented in table 5. Analyses at 12 and 20 months yielded similar results. The greater the extent of worsening in the PGIS the greater the reduction, or worsening, in mean IBMFRS change by months 12 and 20, with the greatest IBMFRS drop observed in the markedly worsened group (at least two categories of worsening): mean reductions of −3.50 at month 12 and −3.83 at month 20 (both p=0.015), compared with very little change in the improved group: a mean increase, or improvement, of 0.25 at month 12 and a decrease of –0.09 at month 20. Moderate to large SRMs were observed for the two worsened groups at both time points, with the markedly worsened group having a large SRM: −1.50 at both time points.

Table 5

Change in the IBMFRS by degree of change in the PGIS, HAQ-DI, SF-36 Physical Domain, mTUG and MMT, between baseline and months 12 and 20

Again, with respect to categories of HAQ-DI change, the greater the increase, or worsening, in HAQ-DI score the greater the reduction, or worsening, in IBMFRS score. At month 20, the mean change scores reduced from a mean increase of 0.15 in the improved group, through –1.23 in the stable group, and –3.11 (p<0.001), –4.89 (p<0.001), and –7.50 (p<0.001) in groups with increasing HAQ-DI deterioration. A large SRM was observed for all three worsened groups (−0.90 to –1.44 and –7.50) at month 20.

For the SF36-PCS; groups were stratified according to quartiles of worsening. The mean change significantly decreased in all quartiles, with the greatest mean reduction being observed in the first quartile which had the greatest degree of worsening in the PCS: −2.66 at 12 months, p<0.001, SRM=−0.855; and −4.09 at 20 months, p<0.001, SRM=−1.12.

The mTUG was also used to investigate responsiveness by stratifying patients according to quartiles of mTUG change. At both month 12 and month 20, the greater the worsening in the mTUG the greater the worsening in the IBMFRS, with the greatest IBMFRS drop observed in the first mTUG quartile with the greatest mTUG reduction: −2.87 at month 12, p<0.001, SRM=−1.278; −4.32 at month 20, p<0.001, SRM=−1.572.

The MMT PerfO was also used to stratify patients into change quartiles. In general, the greater the degree of worsening on the MMT, the greater the degree of worsening on the IBMFRS, but with the relationship being stronger at month 20, with the greatest IBMFRS drop observed in the first MMT quartile with the greatest MMT reduction: −4.00, p<0.001, SRM=−1.432.

Meaningful change threshold

When comparing PGIS and PGIC anchored dichotomous scores of worsening versus no change or improvement at 20 months (table 6), the corresponding best cut point was a drop in 2 IBMFRS points, for both patient-reported outcome anchors and for both threshold criteria (closest to (0, 1) point, and Youden’s index). Results at 12 months (table 6) were similar for the PGIS anchor, while a drop in 1 IBMFRS point performed better for the PGIC anchor. Therefore, taking all the results into account, a drop in 2 IBMFRS points was the most consistent best cut point and was thus taken to indicate a meaningful decline.

Table 6

Receiver operator characteristic (ROC) analyses to determine the optimal threshold for meaningful decline in the IBMFRS total score

Discussion

This study evaluated the measurement properties of the IBMFRS in a cohort of 150 IBM patients who participated in a large IBM clinical trial.8 It demonstrated the validity, reliability and responsiveness of the IBMFRS in IBM. Equivalence between telephone and F2F administration was established, and a decrease of at least 2 points in the IBMFRS total score represented a meaningful change. Overall, the IBMFRS performed well in this study, with the high level of standardisation in its administration being one of the contributing factors, which is critical when measures are being used in research studies such as clinical trials.

There is a growing need to find specific COAs to assess IBM patients both in clinical practice and research. The IBMFRS is a relatively simple and quick assessment to perform that only contains 10 items. Limited evidence20 32 has supported and contributed to the acceptance by regulatory authorities of the IBMFRS as the primary outcome measure in recent8 and ongoing (NCT04789070, NCT05721573) efficacy clinical trials in IBM. Furthermore, the IBMFRS has been important in determining whether other potential COAs or biomarkers, for example, quantitative MRI, are valuable in IBM.33 Although we have previously assessed the IBMFRS using a Rasch based approach34 and showed content validity and reliability in a smaller IBM study,13 there has been a pressing need for more detailed and robust psychometric evaluation of the IBMFRS scale.

The IBMFRS performed well compared with the other health domains used to assess construct validity in this study. IBMFRS scores correlated highly with PerfOs such as MMT scores, mTUG and 6MWT, as expected, although hand grip strength achieved a weaker but still moderate correlation. This is likely to be the result of the other PerfOs including assessment of lower body strength (6MWT), both upper and lower body strength (mTUG), or the strength of multiple muscles (MMT), rather than just grip in isolation. HAQ-DI, SF36-PF and SF36-PCS achieved strong convergent relationships with the IBMFRS.

This study demonstrated that the IBMFRS has adequate internal consistency, with the overall score, and the score after exclusion of each of the 10 items, achieving a Cronbach’s alpha coefficient ≥0.75. The IBMFRS swallowing item was associated with the largest increase in alpha following its exclusion (0.81), suggesting that it measures a slightly different concept than the other IBMFRS items. It is generally accepted that at present there is a lack of reliable tools to assess dysphagia (difficulty or discomfort in swallowing) and bulbar dysfunction in IBM.35–37

When assessing intrarater reliability, we demonstrated ICCs ranging from 0.84 to 0.87 while regarding equivalence of telephone versus F2F in-clinic administration the ICCs ranged from 0.93 to 0.95. These results reflect excellent intrarater reliability and equivalence between remote telephone versus F2F administration of the IBMFRS. While our research group had recently demonstrated a similar observation, the study population in this previous report was considerably smaller (n=9).13 Demonstrating equivalence between telephone and F2F administration is pertinent, particularly amidst the transition towards remote and telemedicine worldwide, largely as a consequence of the COVID-19 pandemic. Roy et al38 recently introduced the IBM personalised index calculator, a modified IBMFRS scale enabling online patient responses, with high equivalence to telephone-obtained IBMFRS scores (ICC=0.98), despite a small study size (n=35).

Overall, the IBMFRS tool demonstrated excellent responsiveness. For the severest groups (ie, markedly worsened or first quartile of worsening) stratified according to all COAs tested, high IBMFRS score SRMs (>1.1) were achieved at 20 months. The higher SRMs, greater statistical significance and stronger monotonic trends found at 20 vs 12 months reflects the greater worsening in IBMFRS observed at this time. In terms of longitudinal relationships, we found moderate and strong relationships between IBMFRS change score and mTUG and HAQ-DI change scores, respectively. The weak to moderate relationships with the other reference measures are likely to reflect the fact that these measures are generic in nature and thus not sufficiently aligned with the specific constructs measured by the IBMFRS. As also observed in other studies, we found a weak correlation between a change in IBMFRS and a change in MMT.20

ROC analysis anchored to PGIC and PGIS identified a 2-point drop in the IBMFRS total score as indicative of meaningful decline. This finding has practical implications for monitoring disease progression in IBM patients clinically and selecting individuals for intensified surveillance. This cut-off also informs the design of future drug trials, particularly in defining target endpoints and outcomes based on a dichotomous IBMFRS-based variable to distinguish responders from non-responders.

This study has limitations. Patients were only recruited from the UK and USA, hence studying the use of the IBMFRS across other countries internationally is needed. In addition, the great majority of the patients included were male and white, limiting the representativeness of the sample; however, it is known that IBM is more common among males (with an approximately 2:1 male-to-female ratio) and white people, and therefore, the study population reflects the expected demographics of the disease in the UK and USA.16 20 39 40 The mean IBMFRS total score of the included patients at baseline indicated overall moderate disability. While psychometric analysis is typically not performed in separate severity groups, IBMFRS scores decreased progressively the greater the severity as indicated by the PGIS and HAQ -DI, suggesting that IBMFRS scores are able to measure disability across the spectrum. Finally, our investigations did not determine how IBMFRS total scores could be used to stratify disease severity and allow division of patients into groups such as mild, moderate and severe.

In conclusion, this study lends support to the use of IBMFRS scale as valid, reliable and responsive tool in monitoring disease progression in IBM when administered by trained raters. Evidence has been provided to propose a drop of at least 2 points in the IBMFRS total score to indicate a meaningful decline in disease status.

Data availability statement

Data are available on reasonable request. Data sharing requests can be submitted after 1 year following publication of the main study results, to the corresponding authors, who will provide a data access request form. Data sharing requests will be considered by the Trial Steering Committee on a case-by-case basis, and data will be shared if the request is considered reasonable, of scientific interest, and legally and ethically possible.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants and the Arimoclomol trial study protocol was approved by the relevant Institutional Review Board (IRB)/Research Ethics Committee (REC), using a single IRB review via the SMART IRB platform for the 11 US centres (University of Kansas Medical Center Human Research Protection Program, reference number STUDY00002461) and the Health Research Authority (HRA) approval process for the UK centre (London—Surrey Borders Research Ethics Committee, reference number: 18/LO/0696). The trial is registered with ClinicalTrials.gov, number NCT02753530 and is completed. The trial was conducted in accordance with the Declaration of Helsinki (October 2013) and its revisions as well as with the valid national laws of the participating countries and the Integrated Addendum to International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use ICH E6(R1): Guideline for Good Clinical Practice (GCP) E6 (R2) effective 14 June 2017, European Regulation No. 536/2014 and with the Commission Directives 1991/507/EEC and 2001/83/EC. Participants gave informed consent to participate in the study before taking part.

Acknowledgments

We wish to thank the patients with IBM who participated in this study; the Neuromuscular Muscle Study Group Executive Committee for their assistance in reviewing the arimoclomol trial study design; and all investigators, coinvestigators, study coordinators and other staff involved in the arimoclomol trial. PMM and MGH are supported by the National Institute for Health Research (NIHR) University College London Hospitals Biomedical Research Centre.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 4.
  8. 5.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.

Footnotes

  • X @pedrommcmachado

  • MMD and PMM contributed equally.

  • Correction notice Since this paper first published, affiliation 7 has been updated and PMM and MMD have been listed as joint last authors.

  • Collaborators Members of the Arimoclomol in IBM Investigator Team of the Neuromuscular Study Group

    Anthony A. Amato MD (Department of Neurology, Brigham & Women's Hospital, Boston, MA, USA); Emma Ciafaloni MD (Department of Neurology, University of Rochester Medical Center, Rochester, NY, USA); Miriam Freimer MD (Department of Neurology, The Ohio State Wexner Medical Center, Columbus, OH, USA; Summer B. Gibson MD (Neuromuscular Division, University of Utah School of Medicine, Salt Lake City, UT, USA); Sarah M. Jones MD (Department of Neurology, University of Virginia, Charlottesville, VA, USA); Todd D. Levine MD (Department of Neurology, HonorHealth, Phoenix, AZ, USA); Thomas E. Lloyd MD (Departments of Neurology and Neuroscience, Johns Hopkins University, Baltimore, MD, USA); Michael P. McDermott PhD (Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY, USA); Tahseen Mozaffar MD (Division of Neuromuscular Disorders, University of California, Irvine, Orange, CA, USA); Aziz I. Shaibani (Nerve & Muscle Center of Texas, Baylor College of Medicine, Houston, TX, USA); Matthew Wicklund (Department of Neurology, University of Texas Health Science Center at San Antonio, TX, USA).

  • Contributors The first draft of the manuscript was written by SS and PMM. All authors critically reviewed and commented on each draft of the manuscript. All authors approved the final manuscript for submission. PMM and MMD contributed to this manuscript equally and are joint last authors. PMM is the guarantor for this study.

  • Funding The arimoclomol trial was cofunded by a 4-year FDA Office of Orphan Products Development grant (grant/award no. R01FD004809) and Orphazyme A/S (grant/award no. N/A).

  • Disclaimer The views expressed are those of the authors and not necessarily those of the UK National Health Service, the NIHR, or the UK Department of Health.

  • Competing interests SS is supported by a UCL Queen Square Institute of Neurology & Cleveland Clinic London MPhil/PhD Neuroscience Fellowship. TS, HD, SR, JR, LL-P and SH are employees of Clinical Outcomes Solutions, a health research consultancy that was paid to conduct the measurement properties analyses reported in this study. CG was an employee of Orphazyme A/S, the pharmaceutical company that funded the measurement properties analyses reported in this study. LH has no disclosures to report. RJB has received funding from the FDA Office Orphan Products Development grant for his role in the arimoclomol trial. MGH receives research funding from the Medical Research Council UK and has previously acted as a consultant for Novartis and for Orphazyme A/S. MMD serves or recently served as a consultant for Abata/Third Rock, Abcuro, Amicus, ArgenX, Astellas, Cabaletta Bio, Catalyst, CNSA, Covance/Labcorp, CSL-Behring, Dianthus, Horizon, EMD Serono/Merck, Ig Society, Janssen, Medlink, Octapharma, Priovant, Sanofi Genzyme, Shire Takeda, TACT/Treat NMD, UCB Biopharma, Valenza Bio and Wolters Kluwer Health/UpToDate; MMD also received research grants or contracts or educational grants from Alexion/AstraZeneca, Alnylam Pharmaceuticals, Amicus, Argenx, Bristol-Myers Squibb, Catalyst, CSL-Behring, FDA/OOPD, GlaxoSmithKline, Genentech, Grifols, Mitsubishi Tanabe Pharma, MDA, NIH, Novartis, Octapharma, Orphazyme, Ra Pharma/UCB, Sanofi Genzyme, Sarepta Therapeutics, Shire Takeda, Spark Therapeutics, The Myositis Association, and UCB Biopharma/RaPharma. PMM has received honoraria from Abbvie, BMS, Celgene, Eli Lilly, Galapagos, Janssen, MSD, Novartis, Orphazyme Pfizer, Roche and UCB.

  • Provenance and peer review Not commissioned; externally peer reviewed.