Date Published: June 12, 2019
Publisher: Public Library of Science
Author(s): Daniel Langenkämper, Erik Simon-Lledó, Brett Hosking, Daniel O. B. Jones, Tim W. Nattkemper, Cem M. Deniz.
The evaluation of large amounts of digital image data is of growing importance for biology, including for the exploration and monitoring of marine habitats. However, only a tiny percentage of the image data collected is evaluated by marine biologists who manually interpret and annotate the image contents, which can be slow and laborious. In order to overcome the bottleneck in image annotation, two strategies are increasingly proposed: “citizen science” and “machine learning”. In this study, we investigated how the combination of citizen science, to detect objects, and machine learning, to classify megafauna, could be used to automate annotation of underwater images. For this purpose, multiple large data sets of citizen science annotations with different degrees of common errors and inaccuracies observed in citizen science data were simulated by modifying “gold standard” annotations done by an experienced marine biologist. The parameters of the simulation were determined on the basis of two citizen science experiments. It allowed us to analyze the relationship between the outcome of a citizen science study and the quality of the classifications of a deep learning megafauna classifier. The results show great potential for combining citizen science with machine learning, provided that the participants are informed precisely about the annotation protocol. Inaccuracies in the position of the annotation had the most substantial influence on the classification accuracy, whereas the size of the marking and false positive detections had a smaller influence.
In recent years computer vision has made a big leap forward in tackling some of the most demanding problems such as detection of cars or people in photos, owing to the emergence of deep learning [1, 2]. Deep learning methods for image classification and object detection were successfully proposed but mostly limited to everyday image domains, i.e. images showing “everyday objects” from human civilization such as cars, furniture, people. Please note that in machine learning, especially in deep learning, these images are often referred to as natural images, but due to the possibility of misunderstandings in interdisciplinary research, we will refer to them as everyday images. The employment of deep learning algorithms brings with it a number of requirements and assumptions, i) availability of huge collections of annotated image data, ii) good image quality (high signal-to-noise-ratio, no extreme light exposure, limited cast shadows) and iii) high pixel-resolution for objects of interest in the images. One reason for the rapid progress of deep learning in computer vision is the availability of many large image collections (see i) above), accumulated by internet-based projects (e.g. ImageNet ). Because everyday images are of common interest, much research focuses on these image collections. Object detection/segmentation/classification contests like ILSVRC (ImageNet Large Scale Visual Recognition Competition ) are held to compare network performance. Furthermore, the competitors publish the pre-trained models, online, providing a starting point for future projects. Unfortunately, in marine science, there are no such datasets or contests available, which is likely also because the data volume is huge but the number of annotated images is very limited, i.e. labels describing the content on a semantic level. The only competition known to the authors is the National Data Science Bowl—Predict ocean health, one plankton at a time , but this competition is limited to plankton and did not receive the same amount of attention as, e.g. the ILSVRC did.
Analyzing the CS primer-experiment from step I) we observed four kinds of errors or inaccuracies. First, the CS have produced false positives (FP), i.e. objects of no interest, or second, missed objects, so-called false negatives (FN). Third, in the case of true positive detections, the CS sometimes marked inaccurate positions (IP) of the annotations, or fourth, an inappropriate circle size (IR) was chosen.
In our study, we have identified and analyzed the four common annotation parameters reflecting the differences between a CS derived annotation and an expert-derived one. Those are false positives (FP), false negatives (FN), inaccurate position (IP) and inaccurate radius/size (IR).