Date Published: January 26, 2017
Publisher: Public Library of Science
Author(s): Cristina Rubio-Escudero, Justo Valverde-Fernández, Isabel Nepomuceno-Chamorro, Beatriz Pontes-Balanza, Yoedusvany Hernández-Mendoza, Alfonso Rodríguez-Herrera, Andrea Motta.
In this work, we present the results of applying data mining techniques to hydrogen breath test data. Disposal of H2 gas is of utmost relevance to maintain efficient microbial fermentation processes.
Hydrogen breath test is a valid tool for the assessment of gut microbiome functional activity. The interest in evaluating this activity is currently increasing. The focus of our research is to extract new conclusions from these well-known data sources looking at them from a different perspective. This perspective is based on the use of tools well experienced in other research areas such as data mining.
Data sets from 2751 lactose hydrogen breath tests were included (see data in S1 Data Hydrogen breath test data). Collection time ranges over 4 years, from June 2009 to June 2013. Both genders subjects are between 1 and 14 years old. The information acquired for each patient was: gender, date of the test, age of the patient at the time of the test, weight, height, private assurance company name and zip code.
The data set was made up of 2751 hydrogen tests fulfilling inclusion criteria. From them, 181 were excluded because of missing data. Following the Rome Consensus, we considered as Lactose Malabsorption (LM) samples from patients showing an increase of hydrogen levels of 20 parts per million (ppm) above the baseline. The number of patients diagnosed of lactose malabsorption (LM) following these criteria was 839, 32.64% of all patients. In our study, we have not discriminated non-hydrogen producing patients to avoid loses due to selection bias.
Data mining techniques are increasing their presence in practical clinic, complementing classical statistics analysis. Data mining is particularly useful when data volume increases . Among all test available involving exhaled hydrogen, the lactose ones are the most widely used and with more scientific evidence, and there is no doubt about its clinical utility . Lactose hydrogen breath is a well stablished test that, due to the results yielded (numerical values throughout time), is a good candidate for data mining analysis. As far as we know, this work is the first one to apply data mining to lactose hydrogen breath test results.