Validation des échelles d'évaluation en médecine physique et de réadaptation : comment apprécier correctement leurs qualités psychométriques

doi:10.1016/j.annrmp.2005.04.004

Annales de Réadaptation et de Médecine Physique

Volume 48, Issue 6, July 2005, Pages 281-287

https://doi.org/10.1016/j.annrmp.2005.04.004 Get rights and content

Résumé

Les qualités psychométriques d'une échelle d'évaluation sont malheureusement mal mesurées en pratique. On étudie donc de manière approfondie chaque qualité : i) la validité avec ses différentes facettes : validité de contenu, contre critère, du construit, d'apparence. On définit chacune d'elles et on détaille les méthodes expérimentales et statistiques employées ; ii) la fidélité évaluée au moyen du coefficient de corrélation intraclasse et de la technique de Bland et Altman. Des directives permettent, lors d'une situation expérimentale donnée, de choisir le coefficient intraclasse approprié ; iii) la sensibilité au changement mesurée au moyen des indices : taille de l'effet et réponse moyenne standardisée. Des recommandations pratiques terminent l'exposé.

Abstract

The psychometric properties of a rating scale are, unfortunately, badly assessed in practice. Each property is thoroughly studied. Validity has different facets: content, criterion, construct, face validity. Each facet is accurately defined and the experimental and statistical methods used are explained in detail. Reliability is assessed with use of the intraclass correlation coefficient and the Bland and Altman method. Guidelines are given for choosing, in a specific experimental situation, the appropriate coefficient. Responsiveness is evaluated with use of effect size and standardized response mean. Finally practical recommendations are given.

Introduction

Très utilisées dans de nombreux travaux, les échelles d'évaluation sont composées de plusieurs items dont la cotation est combinée en un score global ou des sous scores dimensionnels. Elles permettent d'évaluer des phénomènes subjectifs ou complexes tels la douleur, la qualité de vie, le handicap, etc.

Pour donner des résultats fiables, elles doivent posséder les trois qualités psychométriques d'un bon instrument de mesure : validité, fidélité, sensibilité du changement.

Malheureusement, les méthodologistes constatent que ces qualités sont très souvent mal étudiées par les promoteurs et utilisateurs d'échelles. Ainsi, plusieurs facettes importantes ne sont pas investiguées, les méthodes expérimentales sont approximatives, les techniques statistiques mal adaptées.

Afin de clarifier les différents concepts et leur mise en œuvre, les psychométriciens américains ont rédigé après de longues discussions des standards [1]. Ce sont eux que nous utiliserons dans la suite de cet exposé pour étudier les principaux aspects d'une bonne validation. Celle-ci repose essentiellement sur l'analyse exhaustive de chacune des qualités de l'échelle, que nous allons successivement étudier.

Section snippets

La validité

La validité présente plusieurs facettes :

Les situations expérimentales

La fidélité peut être évaluée dans diverses situations expérimentales. Par exemple, on peut dans un échantillon de 33 patients mesurer soit :

•
la fidélité interjuges : chaque patient est alors coté au même moment par deux (ou plus) juges différents, de manière indépendante ;
•
la fidélité intrajuge : un seul juge côte deux fois (ou plus) chaque patient à quelques jours de distance, l'état du sujet restant inchangé durant ce laps de temps ;
•
la fidélité test–retest : c'est une situation semblable à la

Mesure quantitative

Il s'agit par exemple d'évaluer un score global d'échelle. Deux méthodes sont alors employées : les coefficients de corrélation intraclasse (ICC) et la technique de Bland et Altman.

Mesure qualitative

Par exemple dans une échelle de qualité de vie, un item évalue la consommation d'antalgiques en cinq modalités de réponse : nulle, faible, modérée, importante, très importante. Cet item est donc une variable qualitative à cinq modalités dont la fidélité (inter- ou intrajuges et test–retest) s'évalue au moyen de deux coefficients : Kappa ou Kappa pondéré.

Ces coefficients sont calculés par de nombreux logiciels. Le lecteur intéressé par la problématique sous jacente et le détail des calculs peut

La cohérence interne (internal consistency) et le coefficient alpha de cronbach

Alpha évalue la cohérence interne d'un ensemble d'items, échelle ou sous-échelle, correspondant à une dimension clinique unique ; c'est-à-dire la force des intercorrélations entre items. Plus les items sont liés entre eux, plus la valeur d'alpha est grande.

Habituellement ce coefficient est étudié avec la fidélité, car il peut être considéré comme un cas particulier d'ICC. Soit par exemple, une échelle unidimensionnelle dont les 8 items évaluent la même dimension : l'anxiété. On peut donc

Sensibilité au changement (responsiveness)

Un instrument est dit sensible au changement s'il est capable de mesurer avec précision les variations en plus ou en moins, du phénomène mesuré.

Conclusion : recommandations pratiques

•
Pour choisir judicieusement une échelle parmi toutes celles qui sont disponibles sur le marché, il faut pouvoir les comparer. Cela suppose que l'on possède des informations détaillées sur les études de validations faites auparavant : les facettes étudiées, les méthodes utilisées, les résultats quantitatifs obtenus etc. Il faut tout faire pour réunir ces informations, malgré les nombreux obstacles pratiques, car sinon un choix rigoureux est impossible.
•
La validité du construit est la facette la

Références (7)

G.R. Norman et al.
Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach
J. Clin. Epidemiol.
(1997)
J.G. Wright et al.
A comparison of different indices of responsiveness
J. Clin. Epidemiol.
(1997)
Standards for Educational and Psychological Testing
(1985)

There are more references available in the full text version of this article.

Cited by (117)

Fecal Incontinence Subtype Assessment (FI-SA): Validation of a new tool to distinguish among subtypes of fecal incontinence
2024, Clinics and Research in Hepatology and Gastroenterology
Three subtypes of fecal incontinence (FI) are described in the literature: urge, mixed and passive FI, but the relevance of this classification remains unknown. To our knowledge, no questionnaire has been validated in a general population of patients with FI to classify patients between the different subtypes of FI. The aim of the present study was to validate the Fecal Incontinence Subtype Assessment (FI-SA) questionnaire in a general population of patients with FI.
All consecutive patients referred to our unit for physiological investigations of anorectal function in case of FI were included. A feasibility study was done to assess the acceptability, understanding, and the reproducibility of the FI-SA questionnaire. Its performance to correctly classify patients between subtypes of FI was evaluated in both a feasibility study and in a validation study, using clinical interview as gold standard.
The FI-SA questionnaire was found to be well accepted and easily understood by patients. Moreover, it was filled rapidly by patients, with a good reproducibility with an intra-class correlation coefficient of 0.97 and 0.87 for questions 1 and 2. Lastly, the accuracy of the FI-SA questionnaire to predict subtypes of FI was 93.3 % in the feasibility study (n = 30) and 81.1 % in the validation study (n = 100), in comparison with clinical interview as gold standard.
The FI-SA questionnaire could be used in the future to help standardize the methodology used among studies to evaluate the classification of patients in different subtypes of FI and ultimately to guide therapeutics.
A new tool to investigate anorectal disorders in patients with multiple sclerosis: STAR-Q
2023, Progres en Urologie
Bowel symptoms are commonly experienced by patients with Multiple sclerosis (PwMS), but no specific questionnaire validated in this population allows a rigorous assessment.
Validation of a multidimensional questionnaire assessing bowel disorders in PwMS.
A prospective, multicenter study was conducted between April 2020 and April 2021. The STAR-Q (Symptoms’ assessmenT of AnoRectal dysfunction Questionnaire), was built in 3 steps. First, literature review and qualitative interviews were performed to create the first version, discussed with a panel of experts. Then, a pilot study assessed comprehension, acceptation and pertinence of items. Finally, the validation study was designed to measure content validity, internal consistency reliability (alpha coefficient of Cronbach) and test–retest reliability [intraclass correlation coefficient (ICC)]. The primary outcome was good psychometric properties with Cronbach's α > 0.7 and ICC > 0.7.
We included 231 PwMS. Comprehension, acceptation and pertinence were good. STAR-Q showed a very good internal consistency reliability (Cronbach's α = 0.84) and test-retest reliability (ICC = 0.89). Final version of STAR-Q was composed of 3 domains corresponding in symptoms (Q1–Q14), treatment and constraints (Q15–Q18) and impact on quality of life (Q19). Three categories of severity were determined (STAR-Q ≤ 16: minor, between 17 and 20: moderate, and ≥ 21: severe).
STAR-Q presents very good psychometric properties and allows a multidimensional assessment of bowel disorders in PwMS.
2
Les troubles anorectaux touchent fréquemment les patients atteints de sclérose en plaques (PaSEP), mais aucun questionnaire spécifique validé dans cette population ne permet une évaluation rigoureuse. Cette étude vise à valider un questionnaire multidimensionnel évaluant les troubles anorectaux chez les PaSEP.
Une étude prospective et multicentrique a été menée entre juin 2019 et avril 2021. Le STAR-Q a été construit en 3 étapes. Une revue de la littérature et des entretiens qualitatifs avec des patients ont été réalisés afin de créer la première version du questionnaire, validée par un panel d’experts. Une étude pilote vient ensuite valider la compréhension, l’acceptation et la pertinence des items. Enfin, une étude de validation a été effectuée afin de mesurer la validité de contenu, la cohérence interne (coefficient alpha de Cronbach) et la reproductibilité [coefficient de corrélation intraclasse (ICC)]. Le critère de jugement principal concernait l’obtention de bonnes propriétés psychométriques du questionnaire avec un α de Cronbach > 0,7 et un ICC > 0,7.
Nous avons inclus 231 PaSEP. La compréhension, l’acceptation et la pertinence des items étaient bonnes. STAR-Q a montré une très bonne cohérence interne (Cronbach's α = 0,84) et une très bonne reproductibilité (ICC = 0,89). La version finale du STAR-Q est composée de 3 domaines correspondant aux symptômes (Q1–Q14), aux traitements et contraintes (Q15–Q18) et à l’impact sur la qualité de vie (Q19). Trois catégories de sévérité ont été déterminée : ≤ 16 : mineure, 17 à 20 : modérée, et ≥ 21 : sévère.
Le STAR-Q présente de très bonnes propriétés psychométriques et permet une évaluation multidimensionnelle des troubles anorectaux chez les PaSEP.
2
Balneotherapy in spondyloarthropathy: A systematic review
2022, Therapies
To evaluate the effectiveness of balneotherapy on spondyloarthritis.
Two authors independently searched the CENTRAL, MEDLINE, SCOPUS, EMBASE and WEB OF SCIENCE databases until July 2017, for randomized controlled trials published in French or English, that included participants, and interventions: adults with spondyloarthritis, treated by balneotherapy program or one of its components and compared with any other intervention or no treatment. Internal validity, external validity, quality of the statistical analysis, and publication bias were systematically evaluated. We report the best level of evidence.
Nine articles were selected; the internal validity was high in two studies, average in one study, and low in six studies. With high internal validity, one study found a difference for pain between immersion in radon-rich water and tap water for the whole population or rheumatic disease, but the BASFI is not improved for the subgroup of patients with spondyloarthritis. The other study with high validity reported a significant 28-week improvement in quality of life and a composite index. In a study with moderate internal validity involving ankylosing spondylitis patients with associated with inflammatory bowel disease, a balneotherapy program demonstrated a relevant clinical improvement when compared to patients on waiting list. With low internal validity, TNFa inhibitors + spa therapy were found to be superior to a treatment with TNFa inhibitors alone in patients with psoriatic arthritis.
Two trials with high validity demonstrated improvements, but this systematic review is not sufficient to prove the efficacy of balneotherapy in spondyloarthritis. More trials are needed with larger sample size to confirm the preliminary results observed and conclusively determine the benefits of balneotherapy.
Aqua: A new questionnaire assessing anticholinergic side effects in neurogenic population (Aqua: Anticholinergic side effects questionnaire)
2022, Progres en Urologie
Citation Excerpt :
Alpha coefficient of Cronbach was considered as very good if > 0.7 [11]. Test–retest reliability was tested using the intraclass correlation coefficient (ICC) which was significant over 0.7 [11]. The first questionnaire was filled at the end of the first consultation and patients had to answer a second questionnaire (filled at home) 7 days after the first consultation.
Validate a new questionnaire to assess the side effects secondary to anticholinergics in neurogenic population suffering from Adult neurogenic lower urinary tract dysfunction (ANLUTD).
We conducted a prospective, monocentric study in a Neuro-urology Department of a University Hospital between February 2015 and April 2020. To allow a full psychometric validation of a questionnaire, the study protocol included 3 steps: qualitative interviews, feasibility study and validation study. The primary outcome was good psychometric properties defined with good internal consistency reliability (Cronbach's α > 0.7) and good test–retest reliability (intraclass correlation coefficient (ICC) > 0.7).
we included 64 patients with ANLUTD secondary to neurogenic disorders. Feasibility study demonstrate very good acceptation and comprehension for 97% of patients. Validation study showed good internal consistency with Cronbach's α = 0,69 and very good ICC = 0,73. AQUA is composed with 8 items scoring 0 (no side effect) to 2 (major side effect) for a total score between 0 to 16. Time to fulfill is very quick. Mean score in our population was 4,1 (sd 2,9).
AQUA is the first validated tool to assess side effects secondary to antimuscarinic treatment for neurogenic population suffering from ANLUTD.
2.
Valider un nouveau questionnaire pour évaluer les effets secondaires des traitements anticholinergiques dans une population de patients atteints de troubles urinaires d’origine neurologique.
Nous avons réalisé une étude prospective monocentrique dans un service de neuro urologie entre février 2015 et avril 2020. Afin d’obtenir des critères psychométriques suffisants, la validation du questionnaire a été réalisée en 3 étapes : entretiens qualitatifs, étude de faisabilité et étude de validation. Le critère de jugement principal permettant d’obtenir une bonne qualité psychométrique était un coefficient alpha de Cronbach > 0,7 et une bonne reproductibilité (Coefficient de Corrélation Intraclasse (CCI) > 0,7).
Nous avons inclus 64 patients avec troubles urinaires d’origine neurologique. L’étude de faisabilité a démontré une très bonne acceptation et compréhension du questionnaire pour 97 % des patients. L’étude de validation a retrouvé une bonne cohérence interne avec alpha de Cronbach à 0,69 et une bonne reproductibilité avec CCI à 0,73. Le questionnaire comprend 8 items avec un score allant de 0 (aucun effet secondaire) à 2 (effet secondaire majeur) correspondant à un score total compris entre 0 et 16. Le temps de passation du questionnaire était rapide et le score total moyen était de 4,1 (sd 2,9).
AQUA est le premier questionnaire validé permettant d’évaluer les effets secondaires des traitements anticholinergiques dans une population de patients neurologiques souffrant de troubles uirnaires.
2.
Assessment of ankle muscle fatigue with a destabilization tool : A comparative study between healthy and chronic ankle unstable subjects
2022, Journal de Traumatologie du Sport
L’entorse latérale de cheville est la blessure la plus fréquente. La fatigue semble jouer un rôle dans ce traumatisme car c’est aussi en fin de match (football, rugby, …) qu’elle se produit le plus souvent. L’objectif de cette étude est d’évaluer la reproductibilité d’un test de résistance à la fatigue et de comparer les scores entre des sujets sains et des sujets ayant une instabilité chronique de cheville (Chronic Ankle Instability).
Un total de 19 sujets sains et 11 sujets CAI ont réalisé un test de résistance à la fatigue en appui unipodal avec une chaussure de déstabilisation. Les sujets devaient réaliser un nombre maximal de répétitions d’inversion lente et d’éversion rapide. Ce test a été réalisé à deux reprises à une semaine d’intervalle.
La reproductibilité relative était très bonne pour les sujets sains (ICC = 0,95) et modérée pour les sujets CAI (ICC = 0,58). L’erreur de mesure reste cependant relativement variable et élevée (SEM = 2,06–4,10 et MDC = 5,70–11,4). Les sujets sains ont été significativement plus résistants à la fatigue que les sujets CAI (p = 0,02).
Le test de résistance à la fatigue est reproductible. Toutefois, il est probable que l’arrêt du test puisse être parfois lié à une incapacité motrice sans lien avec la fatigue. Les sujets instables chroniques présentent une résistance à la fatigue significativement plus faible que les sujets sains démontrant l’intérêt du test en pratique clinique. Un seuil ≤ 8 répétitions est proposé comme limite pour identifier un déficit.
Lateral ankle sprain is the most common injury. Fatigue seems to play a role in this injury as it is also at the end of a match (soccer or rugby) that ankle sprain occur most often. The objective of this study was to evaluate the reproducibility of a fatigue strength test and to compare the scores between healthy subjects and subjects with chronic ankle instability (CAI).
A total of 19 healthy subjects and 11 CAI subjects performed a fatigue strength test on one leg support with a destabilizing sandal. Subjects were required to perform a maximum number of repetitions of the slow inversion and fast eversion. This test was performed on two occasions one week apart.
The relative reproducibility was very good for the healthy subjects (ICC = 0.95) and moderate for the CAI subjects (ICC = 0.58). The measurement error remained relatively variable and high (SEM = 2.06–4.10 and MDC = 5.70–11.4). Healthy subjects were significantly more resistant to fatigue than CAI subjects (P = 0.02).
The fatigue test is reproducible. However, it seems that the failure of the test in some subjects may be related to motor disability unrelated to fatigue. Chronically unstable subjects have significantly lower fatigue resistance score than healthy subjects demonstrating the interest of the test in clinical practice. A threshold ≤ 8 repetitions is proposed as the limit for identifying a deficit.
Fecal incontinence subtype assessment (FI-SA): A new tool to distinguish among subtypes of fecal incontinence in a neurogenic population
2022, Clinics and Research in Hepatology and Gastroenterology
Two subtypes of fecal incontinence (FI) are defined in the literature (urge and passive FI). The pertinence of this classification is unknown due to conflicting findings and heterogeneity of definitions. However, no questionnaire is available to clearly classify patients among subtypes. The objective of the present study was to develop and validate a new tool (Fecal incontinence subtype assessment, FI-SA) in order to better classify patients among the different subtypes of FI.
A prospective monocentric study was conducted in consecutive patients with FI according to Rome IV criteria. To validate psychometric properties of the FI-SA questionnaire, a literature review and qualitative interviews were performed and discussed with an expert panel. A feasibility study was realized to assess acceptability and comprehension of items. The reproducibility was investigated in a validation study.
Comprehension and acceptability were excellent in 90% of patients in the feasibility study (n = 30). Validation study (n = 100) showed a good reproducibility with an intra-class correlation coefficient of 0.91 and 0.89 for questions 1 and 2. Time to fill the questionnaire was 40.0 s. 98.0% patients were classified among subtypes of FI: 34.0% passive FI, 32.0% urge FI and 32.0% mixed FI.
FI-SA is the first questionnaire to classify patients among subtypes of FI with good psychometric characteristics and the first questionnaire introducing the concept of mixed FI. FI-SA could help to determine the pertinence of this classification of FI in the management of these patients.

View all citing articles on Scopus

View full text

Mise au pointValidation des échelles d'évaluation en médecine physique et de réadaptation : comment apprécier correctement leurs qualités psychométriquesValidation of assessment scales in physical medicine and rehabilitation: how are psychometric properties determined?