Research Article: Predicting childhood obesity using electronic health records and publicly available data

Date Published: April 22, 2019

Publisher: Public Library of Science

Author(s): Robert Hammond, Rodoniki Athanasiadou, Silvia Curado, Yindalon Aphinyanaphongs, Courtney Abrams, Mary Jo Messito, Rachel Gross, Michelle Katzow, Melanie Jay, Narges Razavian, Brian Elbel, Robert Moskovitch.


Because of the strong link between childhood obesity and adulthood obesity comorbidities, and the difficulty in decreasing body mass index (BMI) later in life, effective strategies are needed to address this condition in early childhood. The ability to predict obesity before age five could be a useful tool, allowing prevention strategies to focus on high risk children. The few existing prediction models for obesity in childhood have primarily employed data from longitudinal cohort studies, relying on difficult to collect data that are not readily available to all practitioners. Instead, we utilized real-world unaugmented electronic health record (EHR) data from the first two years of life to predict obesity status at age five, an approach not yet taken in pediatric obesity research.

We trained a variety of machine learning algorithms to perform both binary classification and regression. Following previous studies demonstrating different obesity determinants for boys and girls, we similarly developed separate models for both groups. In each of the separate models for boys and girls we found that weight for length z-score, BMI between 19 and 24 months, and the last BMI measure recorded before age two were the most important features for prediction. The best performing models were able to predict obesity with an Area Under the Receiver Operator Characteristic Curve (AUC) of 81.7% for girls and 76.1% for boys.

We were able to predict obesity at age five using EHR data with an AUC comparable to cohort-based studies, reducing the need for investment in additional data collection. Our results suggest that machine learning approaches for predicting future childhood obesity using EHR data could improve the ability of clinicians and researchers to drive future policy, intervention design, and the decision-making process in a clinical setting.

Partial Text

Childhood obesity has been increasing since the 1970s [1]. As of 2016, 18.5% of US children and adolescents aged 2–19 had obesity, with a significantly higher prevalence among boys than girls [2]. Although there has been recent cause to suspect obesity rates for adults and children might be leveling off [3, 4], more recent data question this conclusion [5]: data from 2015–2016 showed increases in obesity rates across children of all ages, including a large increase among children at the youngest ages, 2–5 years old [2]. Growth trajectory simulation models suggest that 57% of children today will have obesity at age 35 [6]. This upward trend is concerning as childhood obesity can lead to diabetes, hypertension, and other conditions in adulthood [7–9]. Because of the strong link between childhood obesity and adult comorbidities, and the difficulty in decreasing BMI later in life, effective strategies are needed to address the condition early in life. In fact, a growing number of early obesity prevention interventions are being developed to decrease obesity-promoting feeding and lifestyle practices beginning in pregnancy and infancy. Some are beginning to demonstrate promising impacts on both promoting healthy habits and decreasing early childhood obesity; however, they currently focus on universal interventions [10–18]. If we were instead able to predict the risk level of a child developing obesity, we would then be able to better target intervention resources through the measurement of the effect of an intervention relative to a child’s risk of developing obesity.

We conducted a retrospective cohort study using EHR data from patients in a safety net health system that serves a racially and ethnically diverse urban community in New York City: Family Health Centers at NYU Langone (formerly, Lutheran Family Health Centers)—one of the largest Federally Qualified Health Centers in the U.S.—which is composed of 8 primary care and specialty locations and over 40 school-based clinics in Brooklyn, New York. The EHR data employed by this study spanned from January 1, 2008 to August 31, 2016 and contained the records of 52,945 children of various ages, and 36,244 of their respective mothers for visits ranging from well-child visits to inpatient and outpatient services. Because not all mothers had given birth or received care in the study health system, there was not always a one-to-one match between mothers and their children. Additionally, some mothers had given birth to more than one child during the data collection period, also contributed to a lower number of mothers represented in the data set than children. The work was approved by the New York University School of Medicine’s Institutional Review Board and we were granted a waiver of informed consent as well as a waiver of authorization to use private health information for research.

The first column of Table 3 shows the demographic breakdown of our EHR population prior to applying our inclusion criteria. These results are comparable to our modeling cohorts with the exceptions of the “No Data Available” categories. Using all 3,449 children (1,751 boys and 1,698 girls) in the study cohort (Table 1) we assessed each variable’s association with the binary obesity outcome between the ages of 4.5 and 5.5. We compared these associations with obesity to the reference group (defined in each feature category section) and show a subset of those variables in Table 3. Overall, 18.6% of our cohort was obese at age five, which is less than the NYC estimate of children attending public schools in grades Kindergarten through eighth grade of 21% [31]. Only a single diagnoses category had a significant association (p<0.001) with obesity at age five: maternal diabetes mellitus, with no infant diagnoses determined to have had a significant association with obesity. Since the Surgeon General's “Call to Action to Prevent and Decrease Overweight and Obesity" in 2001 [65], obesity and its causes has been the focus of numerous scientific studies [8, 66, 67]. Similarly, thousands of state-level policies have been enacted to encourage healthy lifestyles [68]. Despite the massive investments in money and effort so far, very few interventions have been effective at preventing obesity [69]. In this study, we used EHR and machine learning algorithms to identify young children with a high risk of developing obesity that could be specifically targeted for intervention. Using LASSO regression, we could predict obesity, between the ages of 4.5 and 5.5 years old on a held-out test set, achieving average AUC scores of 81.8% for girls and 76.1% for boys (Fig 1).   Source: