Date Published: April 9, 2019
Publisher: Public Library of Science
Author(s): Alexandra Rouquette, Jean-Benoit Hardouin, Alexis Vanhaesebrouck, Véronique Sébille, Joël Coste, Karl Bang Christensen.
The aims were to review practices concerning Differential Item Functioning (DIF) detection in composite measurement scales, particularly those used in health research, and to provide guidance on how to proceed if statistically significant DIF is detected.
This work specifically addressed the Rasch model which is the subject of growing interest in the field of health owing to its particularly advantageous properties. There were three steps: 1) Literature review to describe current practices; 2) Simulation study to determine under which conditions encountered in health research studies can erroneous conclusions be drawn from group comparisons when a scale is affected by DIF but which is not considered; 3) Based on steps 1 and 2, formulation of recommendations that were subsequently reviewed by leading internationally recognized experts.
Four key recommendations were formulated to help researchers to determine whether statistically significant DIF is meaningful in practice, according to the kind of DIF (uniform or non-uniform) and the DIF effect size.
This work provides the first recommendations on how to deal in practice with the presence of DIF in composite measurement scales used in health research studies.
Other than some purely descriptive studies, almost all health research studies include group comparisons: typical study designs involve a primary outcome measured in every subject, whose occurrence (if categorical) or mean (if continuous) is compared between groups defined by a characteristic or exposure of interest. More complex designs require multivariate analyses or subgroup analyses. To be accurate, the measurement of the outcome must be valid in all groups studied. Otherwise, the difference (or the absence of difference) observed between groups may be, partly or totally, an artifact due to the measurement instrument not being valid in one or several groups. Accurate group comparisons require measurement invariance: “the measuring device should function in the same way across varied conditions, so long as those varied conditions are irrelevant to the attribute being measured” .
Following the review, recommendations were not modified but their justification was improved and their application field more precisely defined. Median of reviewers’ ratings to each AGREE item and scores to the six AGREE domains are reported in the S2 Table. Scores were higher than 70% in every domain except the “Applicability domain”. As other improvements, a box (S1 Box) gathering the recommendations altogether was supplied and strengths and limits of each method used in this work clarified.
Our combination of a literature review and a simulation study led to the first consensual recommendations reviewed by international experts for the assessment of DIF within the Rasch framework in short composite measurement scales. There is a growing interest in DIF in health research, and these recommendations will help researchers conducting DIF assessment studies. They will also help scale users because the information provided in DIF assessment studies will be clearer on how to report and handle DIF into account in practice. This will contribute to better accuracy of results in epidemiological studies by improving the measurement properties of composite measurement scales, increasingly used in health research.