Date Published: May 20, 2019
Publisher: Public Library of Science
Author(s): Grzegorz Chlebus, Hans Meine, Smita Thoduka, Nasreddin Abolmaali, Bram van Ginneken, Horst Karl Hahn, Andrea Schenk, Kumaradevan Punithakumar.
To compare manual corrections of liver masks produced by a fully automatic segmentation method based on convolutional neural networks (CNN) with manual routine segmentations in MR images in terms of inter-observer variability and interaction time.
For testing, patient’s precise reference segmentations that fulfill the quality requirements for liver surgery were manually created. One radiologist and two radiology residents were asked to provide manual routine segmentations. We used our automatic segmentation method Liver-Net to produce liver masks for the test cases and asked a radiologist assistant and one further resident to correct the automatic results. All observers were asked to measure their interaction time. Both manual routine and corrected segmentations were compared with the reference annotations.
The manual routine segmentations achieved a mean Dice index of 0.95 and a mean relative error (RVE) of 4.7%. The quality of liver masks produced by the Liver-Net was on average 0.95 Dice and 4.5% RVE. Liver masks resulting from manual corrections of automatically generated segmentations compared to routine results led to a significantly lower inter-observer variability (mean per case absolute RVE difference across observers 0.69%) when compared to manual routine ones (2.75%). The mean interaction time was 2 min for manual corrections and 10 min for manual routine segmentations.
The quality of automatic liver segmentations is on par with those from manual routines. Using automatic liver masks in the clinical workflow could lead to a reduction of segmentation time and a more consistent liver volume estimation across different observers.
Total liver volume plays an essential role in planning liver interventions such as liver surgery or radioembolization and in therapy response assessment [1, 2]. Manual liver segmentation is a tedious and time-consuming task, which also is prone to inter-observer variability, in particular for MRI data. Automation of the liver contouring process allows for shorter segmentation times and reduction of measurement subjectivity .
The CNNs constituting the Liver-Net were trained in 3h, 1.5h and 6h for the axial, coronal and sagittal model, respectively. Time measurements for training as well as for prediction were done on a desktop PC (Intel Core i7-4770K, 32 GB RAM, NVIDIA Titan Xp).
Our Liver-Net method delivered segmentations which were on par with the manual routine segmentations. In case an observer corrected the automatic liver segmentation, the contours and subsequent volume measurement become less subjective than manual segmentations from scratch. The automatic suggestions may help to achieve segmentations more consistent with guidelines used to segment the training data. For example, in the case of Patient10 from Table 3, routinely segmented livers did not include a large lesion in the liver mask, which was not the case for the reference and both corrected segmentations (see Fig 6). Only RA, who also created reference segmentations, was able to achieve a significant quality improvement of corrected masks according to DICE, RVE, and MSD when compared to the automatic Liver-Net segmentations. The consistency among observers measured as ICC(2,1) for manual (0.989) and corrected (0.999) segmentations was excellent and similar to the inter-observer correlation reported for CT (0.997) .
We presented a validation of a fully automatic method dubbed Liver-Net for liver segmentation in MRI based on three orthogonal neural networks and its corrected masks. The comparison with manual routine segmentations provided by three clinical users showed that Liver-Net produces liver masks of a comparable quality according to Dice index, relative volume error, mean surface distance, and Hausdorff distance. Error analysis of the Liver-Net showed that, on average, 32% of slices had to be corrected, which required on average 2 minutes of interaction time. Corrected liver masks led to a significantly lower inter-observer variability when compared to manual segmentations from scratch (0.69% vs. 2.75% for mean per-case absolute inter-observer RVE difference and 0.999 vs. 0.989 for the intra-class correlation coefficient). We identified the following factors as sources of major errors of our method: contrast changes close to organ boundary, inhomogeneous lesions and similar appearance of surrounding structures. Successful automated elimination of these errors would most probably lead to a substantial improvement in segmentation quality. One possible solution to this may be the extension of the training dataset with cases causing the above-mentioned errors. Additionally, a hard example mining strategy could be employed, which concentrates the learning process on the most difficult cases.