Date Published: October 4, 2018
Publisher: Public Library of Science
Author(s): Constance A. Owens, Christine B. Peterson, Chad Tang, Eugene J. Koay, Wen Yu, Dennis S. Mackin, Jing Li, Mohammad R. Salehpour, David T. Fuentes, Laurence E. Court, Jinzhong Yang, Yong Fan.
To evaluate the uncertainty of radiomics features from contrast-enhanced breath-hold helical CT scans of non-small cell lung cancer for both manual and semi-automatic segmentation due to intra-observer, inter-observer, and inter-software reliability.
Three radiation oncologists manually delineated lung tumors twice from 10 CT scans using two software tools (3D-Slicer and MIM Maestro). Additionally, three observers without formal clinical training were instructed to use two semi-automatic segmentation tools, Lesion Sizing Toolkit (LSTK) and GrowCut, to delineate the same tumor volumes. The accuracy of the semi-automatic contours was assessed by comparison with physician manual contours using Dice similarity coefficients and Hausdorff distances. Eighty-three radiomics features were calculated for each delineated tumor contour. Informative features were identified based on their dynamic range and correlation to other features. Feature reliability was then evaluated using intra-class correlation coefficients (ICC). Feature range was used to evaluate the uncertainty of the segmentation methods.
From the initial set of 83 features, 40 radiomics features were found to be informative, and these 40 features were used in the subsequent analyses. For both intra-observer and inter-observer reliability, LSTK had higher reliability than GrowCut and the two manual segmentation tools. All observers achieved consistently high ICC values when using LSTK, but the ICC value varied greatly for each observer when using GrowCut and the manual segmentation tools. For inter-software reliability, features were not reproducible across the software tools for either manual or semi-automatic segmentation methods. Additionally, no feature category was found to be more reproducible than another feature category. Feature ranges of LSTK contours were smaller than those of manual contours for all features.
Radiomics features extracted from LSTK contours were highly reliable across and among observers. With semi-automatic segmentation tools, observers without formal clinical training were comparable to physicians in evaluating tumor segmentation.
Precision medicine aims to customize cancer treatment for an individual patient by considering combined knowledge (i.e., conventional factors such as age and sex, genetics, proteins, and others) [1,2]. Precision medicine seeks to completely characterize the tumor to determine optimal treatment based on patient-specific characteristics. In recent years, studies have shown that radiomics features have the potential to significantly improve our ability to stratify patients according to likely treatment response beyond conventional prognostic factors, thereby leading to truly personalized cancer care [3–7].
Tumor delineation is an important aspect of the radiomics workflow. Variation in contouring can affect the extracted feature values, which would undoubtedly influence subsequent steps in the radiomics workflow. Identifying contouring software tools that improve feature reliability helps to mitigate feature uncertainties that arise from inconsistent contouring. In this study, we evaluated the uncertainty of radiomics features from both manual and semi-automatic segmentation due to intra-observer, inter-observer, and inter-software reliability. We found that, using semi-automatic segmentation such as LSTK, observers without formal clinical training can generate contours that are comparable to manually drawn contours generated by formally trained physicians (Fig 4).
Our findings showed that radiomics features computed from semi-automatic segmented volumes have better feature reproducibility and reliability than those computed from manual segmented volumes. In semi-automatic segmentation, the tool with less human interaction (i.e. LSTK) resulted in better feature reliability as well. Our results also showed that with semi-automatic segmentation tools, observers without formal clinical training were comparable to physicians in evaluating tumor segmentation. Our findings suggest the need of developing fully automatic segmentation tools (without any user input) for radiomics studies in order to minimize the impact from contouring uncertainty and to improve feature reproducibility and repeatability for subsequent analysis such as radiomics outcome studies or longitudinal clinical studies that assess tumor response.