Article Text

Download PDFPDF

Research paper
Development and assessment of the inter-rater and intra-rater reproducibility of a self-administration version of the ALSFRS-R
  1. Leonhard A Bakker1,2,
  2. Carin D Schröder2,3,
  3. Harold H G Tan1,
  4. Simone M A G Vugts1,
  5. Ruben P A van Eijk1,4,
  6. Michael A van Es1,
  7. Johanna M A Visser-Meily2,3,
  8. Leonard H van den Berg1
  1. 1Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, The Netherlands
  2. 2Center of Excellence for Rehabilitation Medicine, Brain Center Rudolf Magnus, Utrecht University, University Medical Center Utrecht, and De Hoogstraat Rehabilitation, Utrecht, The Netherlands
  3. 3Department of Rehabilitation, Physical Therapy Science, and Sports Medicine, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, The Netherlands
  4. 4Biostatistics & Research Support, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
  1. Correspondence to Professor Leonard H van den Berg, Department of Neurology, Brain Centre Rudolf Magnus, University Medical Centre Utrecht, Utrecht 3584CX, The Netherlands; L.H.vandenBerg{at}umcutrecht.nl

Abstract

Objective The Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised (ALSFRS-R) is widely applied to assess disease severity and progression in patients with motor neuron disease (MND). The objective of the study is to assess the inter-rater and intra-rater reproducibility, i.e., the inter-rater and intra-rater reliability and agreement, of a self-administration version of the ALSFRS-R for use in apps, online platforms, clinical care and trials.

Methods The self-administration version of the ALSFRS-R was developed based on both patient and expert feedback. To assess the inter-rater reproducibility, 59 patients with MND filled out the ALSFRS-R online and were subsequently assessed on the ALSFRS-R by three raters. To assess the intra-rater reproducibility, patients were invited on two occasions to complete the ALSFRS-R online. Reliability was assessed with intraclass correlation coefficients, agreement was assessed with Bland-Altman plots and paired samples t-tests, and internal consistency was examined with Cronbach’s coefficient alpha.

Results The self-administration version of the ALSFRS-R demonstrated excellent inter-rater and intra-rater reliability. The assessment of inter-rater agreement demonstrated small systematic differences between patients and raters and acceptable limits of agreement. The assessment of intra-rater agreement demonstrated no systematic changes between time points; limits of agreement were 4.3 points for the total score and ranged from 1.6 to 2.4 points for the domain scores. Coefficient alpha values were acceptable.

Discussion The self-administration version of the ALSFRS-R demonstrates high reproducibility and can be used in apps and online portals for both individual comparisons, facilitating the management of clinical care and group comparisons in clinical trials.

  • agreement
  • amyotrophic lateral sclerosis
  • amyotrophic lateral sclerosis functional rating scale-revised
  • inter-rater
  • intra-rater
  • reliability
  • reproducibility
View Full Text

Statistics from Altmetric.com

Introduction

For patients with motor neuron disease (MND), a group of progressive neurological disorders comprising amyotrophic lateral sclerosis (ALS), primary lateral sclerosis and progressive muscular atrophy, disease severity and disease progression are characterised by progressive loss of function. The Amyotrophic Lateral Sclerosis Functional Rating Scale (ALSFRS)1 2 and its revised version (ALSFRS- R)3 are the most widely applied rating scales for measuring disease severity and progression in patients with MND.

Self-administration of the ALSFRS-R via internet and apps might prove to be a method for assessing disease severity and disease progression that is complementary to face-to-face consultations, allowing a more personalised management of care.4 Furthermore, administration of the ALSFRS-R via internet could simplify trial participation for patients and lower the burden of participation in clinical trials that use the ALSFRS-R as an outcome.5 6 Although the rating scale was originally developed for administration by healthcare professionals, both the ALSFRS and ALSFRS-R have been adapted for several modes of administration, i.e. for administration to patients and caregivers via telephone,7–9 for self-administration,10 and for self-administration via internet4 i.e. in online portals11 or in apps.6 12 Currently, monitoring disease progression using apps and online portals informs decision-making by healthcare professionals in clinical care, provides a proxy for disease severity to researchers in clinical trials and provides patients with MND with insights into the severity and progression of the disease, which might contribute to feelings of control and autonomy in making care-related decisions.

For monitoring of disease progression using self-administration of the ALSFRS-R to be feasible in clinical care and clinical trials, it is pivotal that scores reported by patients with MND demonstrate high reproducibility, i.e. reliability and agreement. Two previous studies provided evidence on the reproducibility of self-administration versions of the ALSFRS and the ALSFRS-R in patients with ALS reporting acceptable agreement and reliability.10 13 14 These studies, however, focused on the total score, and recent studies have shown that the profile of domain scores provides more detailed information on patients’ disease severity.15–17

To facilitate the self-assessment of disease severity and disease progression using the ALSFRS-R via the internet, a self-administration version of the ALSFRS-R has been developed that includes sufficient guidance for choosing the correct response category. The objective of the present study was to evaluate the inter-rater and intra-rater reproducibility of the total score and domain scores of the self-administration version of the ALSFRS-R in patients with MND in the Netherlands.

Methods

This study was conducted according to the Guidelines for Reporting Reliability and Agreement Studies (GRRAS).18 Inter-rater and intra-rater reproducibility of the self-administration version of the ALSFRS-R was assessed.

Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised

To date, the ALSFRS-R, a 12-item disease-specific instrument that measures the extent to which patients with MND are capable and independent in performing 12 functional activities, is the most widely applied rating scale in both clinical practice and clinical trials, and its raw total score is treated as a general disease severity score. However, the measurement model of the ALSFRS-R was developed to comprise four subscales, that is, bulbar function, fine motor function, gross motor function, and respiratory function, each measuring one domain affected by the disease. The instrument is structured on a five-point ordered categorical scale that ranges from 4 to 0, with 4 indicating no loss of function and 0 indication total loss of function.

Development of a self-administration version of the ALSFRS-R

The items of the self-administration version of the ALSFRS-R were adapted from the original ALSFRS-R and its standard operating procedures to include instructions for scoring by patients. Initial versions were reviewed for content validity by experts and patients with MND, and questions were adjusted iteratively based on their feedback. The rating scale includes all items of the physician-administered ALSFRS-R, but items have additional instructions to reflect the rules for administering the ALSFRS-R (see online supplementary file 1). The software programme NetQuestionnaires19 was used for the online administration of the self-administration version of the ALSFRS-R.

Procedure

Inter-rater reproducibility

The inter-rater reproducibility of the self-administration version of the ALSFRS-R was assessed in a sample of 59 patients with MND. Patients were recruited from the outpatient clinic of the University Medical Center Utrecht and from the population-based research database. Patients filled in the self-administration version of the ALSFRS-R online prior to assessment by the evaluators. Evaluations were either at the outpatient clinic of the University Medical Center Utrecht or at the patient’s home. Disease severity was assessed with the ALSFRS-R by three members of the MND team: a research assistant, a researcher, and a physician-researcher. These raters attended the ENCALS (European Network for the Cure of ALS) session on rating disease severity using the ALSFRS-R and were trained accordingly. Evaluators performed all assessments independently.

Intra-rater reproducibility

The reliability and agreement between scores on T0 and T1 were assessed in a convenience sample of 170 patients with MND, who completed the self-administration version of the ALSFRS-R as part of an ongoing study, to minimise additional burden due to additional testing. Patients received an invitation to fill out the rating scale on two occasions with an interval of 14 days. For the assessment of intra-rater reproducibility, patients who completed the online questionnaire twice with a maximum interval of 21 days were included in the analysis.

Sample size

The required sample size for the assessment of inter-rater reproducibility was calculated using the formula of Giraudeau and Mary.20 For an expected intraclass correlation coefficient (ICC) of minimally 0.70, a confidence interval of approximately 0.20, and four raters, that is, the patients themselves and three evaluators, a sample size of 56 patients was required. The required sample size for the assessment of intra-rater reproducibility was not calculated in advance.

Statistical analysis

Internal consistency

Cronbach’s coefficient alpha21 was calculated to assess the internal consistency of the total score and domain scores of the self-administered version of the ALSFRS-R. A Cronbach’s coefficient alpha value exceeding 0.70 was considered acceptable.

Inter-rater and intra-rater reliability

Reliability is a measure to define how well patients can be differentiated using the instrument of interest.18 22 ICCs were computed to assess both the inter-rater and intra-rater reliability of the total score and the domain scores of the self-administered version of the ALSFRS-R. ICCs were computed using a two-way random effects analysis of variance (ANOVA) model for absolute agreement, i.e. ICC(2,1).23 Variance components for all ICCs are presented in online supplementary file 2 . An ICC value higher than 0.70 was considered acceptable for application in groups of patients and an ICC value higher than 0.90 was considered acceptable for the measurement of individual patients.24

Inter-rater and intra-rater agreement

Agreement is the extent to which scores are identical between raters or time points.18 22 To assess the agreement between modes of administration and between different time points, difference plots were constructed following Bland and Altman25: the mean difference between time points, that is, the systematic error (Embedded Image), was plotted against the difference between the time points (T0 – T1) or raters. The limits of agreement, which were calculated as Embedded Image ± 1.96 ×sddifference, define the range within which 95% of the differences, that is, the random error, lie between the two measurements. In the presence of systematic error, these limits of agreement also display the smallest detectable change (SDC), that is, the change beyond measurement error. In the absence of systematic error, the proper calculation of the SDC is 1.96 ×sddifference. Paired samples t-tests were used to assess the difference in means between raters or time points.

Software

Statistical analyses were performed with R26; ICC and Cronbach’s coefficient alpha were calculated using the psych package.27All results were corrected for multiple testing using the method proposed by Benjamini and Hochberg.28

Sensitivity analyses

The assessment of inter-rater and intra-rater reproducibility of the individual items was initially beyond the scope of the present study, but is reported as supplementary material (online supplementary file S6–S8). Inter-rater and intra-rater reliability for the individual items was assessed using ICCs, which are equivalent to weighted kappa with quadratic weights.29 ICCs were computed using a two-way random effects ANOVA model for absolute agreement, i.e. ICC(2,1).23 The inter-rater and intra-rater agreement of the individual items was assessed with the percentage of agreement, computed using the irr package.30

All analyses were repeated and reported for the subsample of patients with ALS as a sensitivity analysis (online supplementary file S9).

Results

Demographic and clinical characteristics

Table 1 displays the demographic and clinical characteristics of the two samples. The inter-rater reproducibility sample broadly reflects the characteristics of the general ALS population and the interval between assessments was m=1.90 days. The intra-rater reproducibility sample had a larger percentage of females, was younger, and had a lower percentage of patients with a bulbar onset of disease. The interval between assessments in the intra-rater reproducibility sample was m=13.90 days.

Table 1

Demographic and clinical characteristics by sample

Inter-rater reproducibility

To assess the inter-rater reliability of the self-administered version of the ALSFRS-R, an intraclass correlation was calculated for agreement between raters (table 2). The ICC for the ALSFRS-R total score was 0.97 and the ICC for the domain scores ranged from 0.94 to 0.97, indicating excellent reliability for all scores.

Table 2

Inter-rater reliability of ALSFRS-R total score and domain scores

To assess the inter-rater agreement of the self-administration version of the ALSFRS-R, Bland-Altman plots were constructed (see figure 1 and online supplementary file S3, S4). Small systematic biases were observed with regard to the ALSFRS-R total score, with patients reporting higher scores than the evaluators. Mean differences in the total score were statistically significant for two raters: 0.97 points and 0.86 points, respectively. For the domain scores, only the mean differences in the fine motor function domain were statistically significant: 0.61, 0.53, and 0.53 points, respectively. No statistically significant systematic biases were observed in the other ALSFRS-R domain scores.

Figure 1

Bland-Altman plots of ALSFRS-R total score between patients and evaluators. Note: some jitter was added to avoid overlapping data points. (A) Patients and evaluator 1. (B) Patients and evaluator 2. (C) Patients and evaluator 3. ALSFRS-R, Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised.

Intra-rater reproducibility

To assess the intra-rater reliability of self-administration version of the ALSFRS-R, an intraclass correlation was calculated for agreement between raters (table 3). The ICC for the ALSFRS-R total score was 0.97 and the ICC for the domain scores ranged from 0.94 to 0.97, indicating excellent reliability for all scores.

Table 3

Intra-rater reliability of the ALSFRS-R total score and domain scores

To assess the intra-rater agreement of self-administration version of the ALSFRS-R, Bland-Altman plots were constructed (see figure 2 and online supplementary file S5). No bias was observed with regard to self-administration version of the ALSFRS-R total and domain scores, after corrections for multiple testing, indicating that there was no systematic error between T0 and T1. The limits of agreement for the total score ranged from –4.19 to 4.34, reflecting the SDC, indicating that changes in scores within these limits should be attributed to measurement error. The limits of agreement for the domain scores were smallest for the bulbar function domain, ranging from −1.62 to 1.50, and largest for the respiratory function domain score, ranging from the −2.45 to 2.41.

Figure 2

Bland-Altman plots of ALSFRS-R total score and domain scores at T0 and T1. Note: some jitter was added to avoid overlapping data points. (A) Total score. (B) Bulbar function score. (C) Fine motor function score. (D) Gross motor function score. (E) Respiratory function score. ALSFRS-R, Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised.

Internal consistency

The internal consistency of self-administration version of the ALSFRS-R was assessed per rater and per time point (table 4). The total score demonstrated acceptable Cronbach’s coefficient alpha with estimates ranging from 0.80 to 0.86. Furthermore, acceptable Cronbach’s coefficient alpha estimates were observed for the domain scores with values ranging from 0.80 to 0.92.

Table 4

Internal consistency of the self-administered version of the ALSFRS-R and its domain scores

Discussion

The objective of the present study was to assess both the inter-rater and intra-rater reproducibility of a newly developed self-administered version of the ALSFRS-R. Previous studies provided evidence on the reproducibility of self-administered versions of the ALSFRS and the ALSFRS-R in patients with ALS reporting acceptable agreement and reliability.10 13 14 These studies, however, focused on the total score, while recent studies showed that the profile of domain scores provides more detailed information on patients’ disease severity.15–17 The present study expands on previous reproducibility studies by investigating the inter-rater and intra-rater reliability of the domain scores. The results indicate that the self-administration version of the ALSFRS-R total score and domain scores were highly reliable, and the limits of agreement were considered acceptable for monitoring disease progression in patients with MND.

In the present study, the results of the assessment of inter-rater reliability of the self-administration version of the ALSFRS-R total score and its domain scores were excellent and the results of the assessment of inter-rater agreement indicated that patients tended to rate themselves higher than the evaluators, but these differences were only statistically significant for the total score, with mean differences smaller than one, and for the fine motor function domain, with mean differences smaller than one, indicating no or clinically relevant bias between raters. Furthermore, as the limits of agreements are indicative of the smallest detectable difference, differences between observers exceeding these limits indicate a difference beyond measurement error.

In our study, the results of the assessment of intra-rater reliability of the self-administration version of the ALSFRS-R total score and its domain scores were excellent; assessment of intra-rater agreement indicated that there were no statistically significant or clinically relevant differences between T0 and T1. Furthermore, as the limits of agreements are indicative of the SDC in a stable population, changes between time points exceeding these limits, that is, about 4 points (rounded) and about 2 points (rounded) on the domain scores, indicate a difference beyond measurement error. Consequently, individual changes smaller than these limits of agreement should be attributed to measurement error. The values of the SDC may be pivotal in the assessment of true reversals in disease progression of patients with MND.31 In clinical studies, however, smaller changes can be detected at the group level.32 Consequently, the self-administration version of the ALSFRS-R may facilitate the monitoring of disease progression in clinical care and clinical trials. Moreover, since the assessment with the self-administration version of the ALSFRS-R does not necessarily involve the burden of a clinic visit, patients can assess their disease severity more frequently, facilitating a more accurate estimation of individual disease progression.6

Although the results of the present study indicate that the inter-rater and intra-rater reproducibility of the self-administration version of the ALSFRS-R is high, in a few cases, the difference between the patient and the evaluators, or the patient at T0 and T1, was quite large. Possible explanations for these differences might include a sudden deterioration of the patient or inaccurate perception of disease severity, for example, as a result of cognitive deficits. When differences can be attributed to inaccurate perception of disease severity, the ALSFRS-R may be administered by caregivers or healthcare professionals for accurate management of clinical care.

Although the results of the present study indicate that the self-administration version of the ALSFRS-R demonstrates high reproducibility, we recommend that scores of the ALSFRS-R and the self-administration version of the ALSFRS-R are not used interchangeably prior to a formal assessment of measurement equivalence, that is, comparability of scores across modes of administration.

A strength of the present study is that the inter-rater and intra-rater reproducibility was assessed for both self-administration version of the ALSFRS-R total score and the domain scores. Although the total score is generally used, the profile of domain scores provides more detailed information on patients’ disease severity, which may inform and facilitate personalised management of care.15–17 Indeed, scores on the self-administration version of the ALSFRS-R may facilitate the monitoring of patients and related care needs between visits. The added benefit of monitoring patients using apps or portals for clinical care, however, should be evaluated in a future study.

A limitation of the present study is that the cognitive status of patients in the present study was not assessed. As a percentage of patients with MND present with cognitive deficits,33 34 reported scores might not be accurate for these patients. A future study could examine potential factors, for example, cognition, associated with differences in the scores on the self-administration version of the ALSFRS-R and the ALSFRS-R, that is, between raters, or differences in the scores on the self-administration version of the ALSFRS-R within a short interval, that is, between ratings, to assess the feasibility of monitoring patients using self-administration of the ALSFRS-R and to further improve the reproducibility of the self-administration of the ALSFRS-R.

In conclusion, the self-administered version of the ALSFRS-R is a measure of disease severity and disease progression in patients with MND that demonstrates high reproducibility. This rating scale is suitable for monitoring disease severity and progression in patients with MND in both clinical care and clinical trials.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
View Abstract

Footnotes

  • Contributors LAB, CDS and LHvdB designed the study. LAB wrote the manuscript. LAB, HHGT and SMAGV participated in the data collection. LAB performed the statistical analyses. LAB, CDS, HHGT, SMAGV, RPAVE, MAvE, JMAV-M and LHvdB contributed to the interpretation of data. LHvdB provided study supervision. All authors critically reviewed and revised the manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests LAB, CDS, HHGT, SMAGV, RPAVE and JMAV-M have nothing to disclose. MAvE received grants from the Netherlands Organization for Health Research and Development (Veni scheme), The Thierry Latran Foundation, Joint Programme – Neurodegenerative Disease Research (JPND) and the Netherlands ALS Foundation (Stichting ALS Nederland). He received travel grants from Shire (previously Baxalta) and serves on the medical ethical review board of University Medical Centre Utrecht. LHvdB reports grants from ALS Foundation Netherlands, grants from The Netherlands Organization for Health Research and Development (Vici scheme), grants from The Netherlands Organization for Health Research and Development (SOPHIA, STRENGTH, ALS-CarE project), funded through the EU JPND, personal fees from Shire (previously Baxalta), personal fees from Biogen, personal fees from Cytokinetics, other from Prinses Beatrix SpierFonds, other from Latran Foundation, outside the submitted work.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available on reasonable request.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles