Date Published: April 17, 2019
Publisher: Public Library of Science
Author(s): Xingsi Xue, Zhi Hang, Zhengyi Tang, Xiangtao Li.
Due to continuous evolution of biomedical data, biomedical ontologies are becoming larger and more complex, which leads to the existence of many overlapping information. To support semantic inter-operability between ontology-based biomedical systems, it is necessary to identify the correspondences between these information, which is commonly known as biomedical ontology matching. However, it is a challenge to match biomedical ontologies, which dues to: (1) biomedical ontologies often possess tens of thousands of entities, (2) biomedical terminologies are complex and ambiguous. To efficiently match biomedical ontologies, in this paper, an interactive biomedical ontology matching approach is proposed, which utilizes the Evolutionary Algorithm (EA) to implement the automatic matching process, and gets a user involved in the evolving process to improve the matching efficiency. In particular, we propose an Evolutionary Tabu Search (ETS) algorithm, which can improve EA’s performance by introducing the tabu search algorithm as a local search strategy into the evolving process. On this basis, we further make the ETS-based ontology matching technique cooperate with the user in a reasonable amount of time to efficiently create high quality alignments, and make use of EA’s survival of the fittest to eliminate the wrong correspondences brought by erroneous user validations. The experiment is conducted on the Anatomy track and Large Biomedic track that are provided by the Ontology Alignment Evaluation Initiative (OAEI), and the experimental results show that our approach is able to efficiently exploit the user intervention to improve its non-interactive version, and the performance of our approach outperforms the state-of-the-art semi-automatic ontology matching systems.
Ontologies have gained much importance in the past two decades, especially in the biomedical domain. Various biomedical ontologies such as Gene Ontology (GO) , National Cancer Institute (NCI) Thesaurus , Foundation Model of Anatomy (FMA) , and Systemized Nomenclature of Medicine (SNOMED-CT)  have emerged and been maintained, which have been widely used in the medical records annotation , medical data formats standardization , medical or clinical knowledge representation and integration , and medical decision making . Due to continuous evolution of biomedical data, biomedical ontologies are becoming larger and more complex, which leads to the existence of many overlapping information. For example, NCI ontology defines the concept of “Myocardium” related to the concept “Cardiac Muscle Tissue” in FMA ontology, which describes the muscles surrounding the human heart. Since the utilization of these overlapping information is necessary for the integration, aggregation, and inter-operability among ontology-based biomedical systems, it is necessary to find the correspondences between these information, which is commonly known as biomedical ontology matching. However, matching biomedical ontologies is computationally intensive task with quadratic computational complexity , which arises from their characteristics: (1) biomedical ontologies often possess tens of thousands of classes, (2) biomedical terminologies are complex and ambiguous, frequently the same biomedical concept has several names, or the same terminology can be applied to two different entities. Although this challenge has attracted the interest of the community such as Ontology Alignment Evaluation Initiative (OAEI) which includes specific tracks on matching biomedical ontologies, the research on it is still in its infancy.
In this work, the proposed interactive biomedical ontology matching framework is shown in Fig 1. As can be sen from the figure, three working phases, i.e. initialization, ETS-based ontology matching, and user interaction, are outlined by dotted-line boxes. A rectangle inside the dotted-line box represents a working step, and a rectangle with a picture outside the dotted-line box indicates the input or output data, e.g. source and target ontologies, reference alignment and evaluation result. Specifically, the description of three working phases is given as follows:
Since matching biomedical ontology matching is a complex task, ETS-based matching results need to be validated by a user to ensure the alignment’s quality and improve the algorithm’s efficiency . However, it is impractical to require a user to validate all the correspondences at a time, which is both time-consuming and error prone. Thus, how to reduce a user’s workload is the first question we need to answer when implementing an effective user interaction. In addition, how to effectively exploiting the limited user intervention to improve the matching process’s efficiency is the second question that we need to answer. In this work, we get a user involved only when ETS gets stuck, and present the most problematic correspondences (those with low similarity measure value) to him for validation to reduce his workload. When a user validates all the correspondences, the validated results will be further utilized to reduce each gene bit’s search space through a hierarchy-based approach, which can improve the efficiency of hereafter matching process.
In this work, we exploit the Anatomy http://oaei.ontologymatching.org/2016/anatomy/index.html and Large Biomed http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/2016/ track to study the effectiveness of our approach, which are provided by OAEI 2016 http://oaei.ontologymatching.org/2016. The experiment allows the matching approaches to ask an oracle who will then tell the matcher whether the correspondence is right or wrong. Tables 1, 2 and 3 show the mean value of f-measure of the alignments obtained by our approach in thirty independent runs and the results obtained by the participants of OAEI. The symbols r, p and f in the tables stand for recall, precision and f-measure, respectively, and f¯, r¯ and p¯ respectively stand for the matcher’s non-interactive version’s f-measure, recall and precision. In this experiment, we use three metrics, i.e. f-measure, runtime and the mean improvement per request, to evaluate the performances of the interactive biomedical ontology matchers. In particular, f-measure and runtime can be used to measure the effectiveness of semi-automatic ontology matching technique, and the mean improvement per request can measure the efficiency of the user involvement.
To efficiently match biomedical ontologies, in this work, an interactive biomedical ontology matching approach is proposed, which can effectively utilize the user’s knowledge to guide the ETS-based ontology matcher’s search direction and improve its efficiency by reducing the algorithm’s search space. The experimental results show that our approach is able to efficiently exploit the user validation to improve its non-interactive version, and the performance of it outperforms the state-of-the-art interactive biomedical ontology matching techniques. In the future, we are interested in the strategies that can reuse a user’s validation results to further reduce the search space of the algorithm. In addition, we are also interested in decreasing the user’s error rate by warning him when contradicting validations are made.