Date Published: November 20, 2018
Publisher: Public Library of Science
Author(s): Andrew G. Taylor, Clinton Mielke, John Mongan, Suchi Saria
Abstract: BackgroundPneumothorax can precipitate a life-threatening emergency due to lung collapse and respiratory or circulatory distress. Pneumothorax is typically detected on chest X-ray; however, treatment is reliant on timely review of radiographs. Since current imaging volumes may result in long worklists of radiographs awaiting review, an automated method of prioritizing X-rays with pneumothorax may reduce time to treatment. Our objective was to create a large human-annotated dataset of chest X-rays containing pneumothorax and to train deep convolutional networks to screen for potentially emergent moderate or large pneumothorax at the time of image acquisition.Methods and findingsIn all, 13,292 frontal chest X-rays (3,107 with pneumothorax) were visually annotated by radiologists. This dataset was used to train and evaluate multiple network architectures. Images showing large- or moderate-sized pneumothorax were considered positive, and those with trace or no pneumothorax were considered negative. Images showing small pneumothorax were excluded from training. Using an internal validation set (n = 1,993), we selected the 2 top-performing models; these models were then evaluated on a held-out internal test set based on area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and positive predictive value (PPV). The final internal test was performed initially on a subset with small pneumothorax excluded (as in training; n = 1,701), then on the full test set (n = 1,990), with small pneumothorax included as positive. External evaluation was performed using the National Institutes of Health (NIH) ChestX-ray14 set, a public dataset labeled for chest pathology based on text reports. All images labeled with pneumothorax were considered positive, because the NIH set does not classify pneumothorax by size. In internal testing, our “high sensitivity model” produced a sensitivity of 0.84 (95% CI 0.78–0.90), specificity of 0.90 (95% CI 0.89–0.92), and AUC of 0.94 for the test subset with small pneumothorax excluded. Our “high specificity model” showed sensitivity of 0.80 (95% CI 0.72–0.86), specificity of 0.97 (95% CI 0.96–0.98), and AUC of 0.96 for this set. PPVs were 0.45 (95% CI 0.39–0.51) and 0.71 (95% CI 0.63–0.77), respectively. Internal testing on the full set showed expected decreased performance (sensitivity 0.55, specificity 0.90, and AUC 0.82 for high sensitivity model and sensitivity 0.45, specificity 0.97, and AUC 0.86 for high specificity model). External testing using the NIH dataset showed some further performance decline (sensitivity 0.28–0.49, specificity 0.85–0.97, and AUC 0.75 for both). Due to labeling differences between internal and external datasets, these findings represent a preliminary step towards external validation.ConclusionsWe trained automated classifiers to detect moderate and large pneumothorax in frontal chest X-rays at high levels of performance on held-out test data. These models may provide a high specificity screening solution to detect moderate or large pneumothorax on images collected when human review might be delayed, such as overnight. They are not intended for unsupervised diagnosis of all pneumothoraces, as many small pneumothoraces (and some larger ones) are not detected by the algorithm. Implementation studies are warranted to develop appropriate, effective clinician alerts for the potentially critical finding of pneumothorax, and to assess their impact on reducing time to treatment.
Partial Text: Pneumothorax can constitute a medical emergency since the presence of air within the pleural space outside the lung produces collapse of the lung and subsequent respiratory distress, especially in critically ill patients . While the incidence of spontaneous pneumothorax in the United States is relatively low , pneumothorax is often associated with trauma , mechanical ventilation , and iatrogenic injury from procedures such as thoracentesis . The use of adjunctive imaging has reduced this risk somewhat, but even with ultrasound guidance a recent meta-analysis estimated the rate of pneumothorax after thoracentesis to be approximately 4% . Pneumothorax of a clinically significant size is often diagnosed with standard frontal plain film radiography; however, the accuracy of diagnosis is dependent on a number of factors including pneumothorax size, patient positioning, image quality, and variation in radiologist threshold for diagnosis, resulting in a mean sensitivity in the range of 83%–86% in studies assessing this [7–9]. Further, treatment is reliant on timely review of acquired images, both by the radiologist and the referring physician. A study of patients with pneumothorax in the intensive care unit (ICU) found that length of stay in intensive care was longer and the risk of progression to tension pneumothorax (a large pneumothorax that causes obstruction or restriction of blood flow to the heart, producing circulatory collapse) was higher for patients whose pneumothoraces were initially misdiagnosed; further, a significant risk factor for delay in diagnosis and misdiagnosis was development of pneumothorax outside of peak physician staffing hours .
This study, compliant with the Health Insurance Portability and Accountability Act of 1996, was approved by the institutional review board of our institution. The study was granted a consent waiver due to its retrospective design and minimal risk categorization.
We created automated models that had high AUC and were sensitive to large and moderate pneumothoraces while retaining high specificity when evaluated on our internal test set. In particular, the high specificity model (specificity 0.97) produced a PPV of 12.5% for the scenario in which pneumothorax has a prevalence of 1% (including small, moderate, and large pneumothoraces). This performance profile matches what is required for prioritization of low-prevalence findings. While high sensitivity is of course desirable, for our selected use case of triaging larger, potentially more acutely clinically significant pneumothoraces at times when review may be delayed (i.e., overnight), we felt it important that PPV remain high enough that there will not be too many false positives, since this would increase alert fatigue, and clinical radiologists might ignore the findings of the algorithm. With a PPV of 12.5%, a radiologist need only review approximately 8 radiographs for every positive case. However, it is important to make clear that this algorithm is not intended to be relied upon to detect small pneumothoraces (based on our experimental design and training method), and that some moderate and large pneumothoraces may still be missed. In keeping with our research aim, this is meant to be a prioritization and triaging tool for potential emergencies rather than a substitute for careful image review and diagnosis rendered by a human radiologist.