Date Published: January 31, 2019
Publisher: Public Library of Science
Author(s): Jinxiang Xi, Weizhong Zhao, Roi Gurka.
Exhaled aerosols from lungs have unique patterns, and their variation can be correlated to the underlying lung structure and associated abnormities. However, it is challenging to characterize such aerosol patterns and differentiate their difference because of their complexity. This challenge is even greater for small airway diseases, where the disturbance signals are weak.
The objective of this study is exploiting different feature extraction algorithms to develop a practical classifier to diagnose obstructive lung diseases using exhaled aerosol images. These include proper orthogonal decomposition (POD), principal component analysis (PCA), dynamic mode decomposition (DMD), and DMD with control (DMDC). Aerosol images were generated via physiology-based simulations in one normal and four diseased airway models in G7-9 bronchioles. The image data were classified using both the support vector machine (SVM) and random forest (RF) algorithms. The effectiveness of different features was evaluated by classification accuracy and misclassification rate.
Results show a significantly higher performance using dynamic feature extractions (DMD and DMDC) than static algorithms (POD and PCA). Adding the control variables to DMD further improved classification accuracy. Comparing the classification methods, RF persistently outperformed SVM for all types of features considered. While the performance of RF constantly increased with the number of features retained, the performance of SVM peaked at 50 and decreased thereafter. The 5-class classification accuracy was 94.8% using the DMDC-RF model and 93.0% using the DMD-RF model, both of which were higher than 87.0% in the previous study that used fractal dimension features.
Considering that disease progression is inherently a dynamic process, DMD(C)-based feature extraction preserves temporal information and is preferred over POD and PCA. Compared with hand-crafted features like fractals, feature extraction by DMD and DMDC is automatic and more accurate.
Lung diseases, either being restrictive (inhalation) such as acute respiratory distress syndrome (ARDS) and cystic fibrosis, or obstructive (exhalation) such as asthma and chronic obstructive pulmonary disease (COPD), will affect the respiratory airflow and cause a disturbance to the exhaled airflow pattern [1–3]. Exhaled aerosols can reveal a wealth of information about the health of the lungs . However, there are many challenges to correlate these images to the underlying lung structural remodeling. The distributions of the exhaled aerosols are exceedingly complex, which are determined by the airflow and aerosol dynamics. Exhaled aerosol images from deep lungs generally cannot be differentiated by mere inspection. As a result, how to extract useful features from these seemingly chaotic observables is crucial in developing an effective algorithm to diagnose lung abnormalities based on exhaled aerosol images. In our previous studies [5–9], fractal-based features, such as lacunarity, fractal dimension (FD), and multifractal spectrum, have been explored for the quantification of aerosol images and subsequent machine learning of disease status. In combination with the random forest (RF) algorithm [10, 11], the optimal accuracy was predicted at 87.0% for a five-class classification of asthmatic diseases located in small airways (G8 bronchiole) .
In this study, a machine learning framework with static and dynamic feature selections to classify obstructive lung diseases was presented. The impact of feature selection algorithms on classification performances was evaluated using two classification methods (SVM and RF). Results show that a classifier with features the include transient dynamics (DMD and DMDC) significantly outperformed that with static features (POD and PCA), which is consistent with the fact that disease growth is an inherently dynamic process. Also, including control parameters that are responsible for the dynamical system changes further improved the classification, but with a much smaller magnitude. On the other hand, RF gave rise a persistently higher classification accuracy than SVM, irrespective of the features. While the RF performance constantly improved with the number of retained eigenmodes, the SVM performance peaked at 50 features and decreased when more features were included. The best 5-class classification accuracy in this study was 94.8% using the DMDC-RF model, followed by 93.0% using the DMD-RF model with 100 features.