Research Article: Validation of an algorithm to identify children with biopsy-proven celiac disease from within health administrative data: An assessment of health services utilization patterns in Ontario, Canada

Date Published: June 29, 2017

Publisher: Public Library of Science

Author(s): Jason Chan, David R. Mack, Douglas G. Manuel, Nassim Mojaverian, Joseph de Nanassy, Eric I. Benchimol, Neil R. Smalheiser.


Celiac disease (CD) is a common pediatric illness, and awareness of gluten-related disorders including CD is growing. Health administrative data represents a unique opportunity to conduct population-based surveillance of this chronic condition and assess the impact of caring for children with CD on the health system.

The objective of the study was to validate an algorithm based on health administrative data diagnostic codes to accurately identify children with biopsy-proven CD. We also evaluated trends over time in the use of health services related to CD by children in Ontario, Canada.

We conducted a retrospective cohort study and validation study of population-based health administrative data in Ontario, Canada. All cases of biopsy-proven CD diagnosed 2005–2011 in Ottawa were identified through chart review from a large pediatric health care center, and linked to the Ontario health administrative data to serve as positive reference standard. All other children living within Ottawa served as the negative reference standard. Case-identifying algorithms based on outpatient physician visits with associated ICD-9 code for CD plus endoscopy billing code were constructed and tested. Sensitivity, specificity, PPV and NPV were tested for each algorithm (with 95% CI). Poisson regression, adjusting for sex and age at diagnosis, was used to explore the trend in outpatient visits associated with a CD diagnostic code from 1995–2011.

The best algorithm to identify CD consisted of an endoscopy billing claim follow by 1 or more adult or pediatric gastroenterologist encounters after the endoscopic procedure. The sensitivity, specificity, PPV, and NPV for the algorithm were: 70.4% (95% CI 61.1–78.4%), >99.9% (95% CI >99.9->99.9%), 53.3% (95% CI 45.1–61.4%) and >99.9% (95% CI >99.9->99.9%) respectively. It identified 1289 suspected CD cases from Ontario-wide administrative data. There was a 9% annual increase in the use of this combination of CD-associated diagnostic codes in physician billing data (RR 1.09, 95% CI 1.07–1.10, P<0.001). With its current structure and variables Ontario health administrative data is not suitable in identifying incident pediatric CD cases. The tested algorithms suffer from poor sensitivity and/or poor PPV, which increase the risk of case misclassification that could lead to biased estimation of CD incidence rate. This study reinforced the importance of validating the codes used to identify cohorts or outcomes when conducting research using health administrative data.

Partial Text

Celiac disease (CD) is an autoimmune condition characterized by enteropathy resulting from exposure and immune response to gluten, a protein commonly found in wheat, rye, and barley.[1, 2] Once considered a rare disease, CD is now regarded as one of the most common autoimmune disorders, with an estimated prevalence of 1–3% in the overall population and 0.3 to 1.0% among the pediatric population.[3–6] Most studies of the epidemiology of CD were conducted in Europe, with incidence estimated between 2 and 54 cases per 100,000 patient-years (PY).[7–10]

The current study evaluated the feasibility of using health administrative data to capture children with biopsy-proven CD, and explored the trends in the use of CD diagnostic codes in these data over the past two decades in Ontario. Our findings demonstrated that Ontario health administrative data is suboptimal for accurate classification of biopsy-proven CD among children, at least with currently available data structure comprising mostly outpatient physician bill coding and hospitalization data. All of the algorithms derived in the study suffered from low sensitivity, and/or low PPV. We applied two of the test algorithms to the Ontario health administrative data to illustrate the effect of using algorithms for incidence rate estimation. The OHIP-based and SDS-based algorithms, differing by the source of their endoscopy procedural code, both detected an increased trend in health services use associated with CD diagnostic code during the study time period. This likely reflects increased health services utilization for CD and gluten-related disorders. However, due to the suboptimal accuracy of the two algorithms, we cannot make firm conclusions regarding the trends in pediatric CD incidence over the past two decades in Canada.

In summary, the algorithms derived to identify children with biopsy-proven CD from within Ontario health administrative data demonstrated suboptimal performance. However, there was a clear and significant increase in the use of CD diagnostic codes by Ontario physicians. This study demonstrated the limitations of using health administrative data to derive a cohort of children with CD. Not all diseases can be accurately identified using health administrative data codes. Our study emphasizes the importance of evaluating the accuracy and completeness of codes/algorithms used to identify patients from within health administrative data in order to reduce error potentially resulting in misclassification bias.




0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments