Date Published: April 24, 2019
Publisher: Public Library of Science
Author(s): Yong Hwan Kim, Min Song, Diego Raphael Amancio.
In the literature-based discovery, considerable research has been done based on the ABC model developed by Swanson. ABC model hypothesizes that there is a meaningful relation between entity A extracted from document set 1 and entity C extracted from document set 2 through B entities that appear commonly in both document sets. The results of ABC model are relations among entity A, B, and C, which is referred as paths. A path allows for hypothesizing the relationship between entity A and entity C, or helps discover entity B as a new evidence for the relationship between entity A and entity C. The co-occurrence based approach of ABC model is a well-known approach to automatic hypothesis generation by creating various paths. However, the co-occurrence based ABC model has a limitation, in that biological context is not considered. It focuses only on matching of B entity which commonly appears in relation between two entities. Therefore, the paths extracted by the co-occurrence based ABC model tend to include a lot of irrelevant paths, meaning that expert verification is essential.
In order to overcome this limitation of the co-occurrence based ABC model, we propose a context-based approach to connecting one entity relation to another, modifying the ABC model using biological contexts. In this study, we defined four biological context elements: cell, drug, disease, and organism. Based on these biological context, we propose two extended ABC models: a context-based ABC model and a context-assignment-based ABC model. In order to measure the performance of the both proposed models, we examined the relevance of the B entities between the well-known relations “APOE–MAPT” as well as “FUS–TARDBP”. Each relation means interaction between neurodegenerative disease associated with proteins. The interaction between APOE and MAPT is known to play a crucial role in Alzheimer’s disease as APOE affects tau-mediated neurodegeneration. It has been shown that mutation in FUS and TARDBP are associated with amyotrophic lateral sclerosis(ALS), a motor neuron disease by leading to neuronal cell death. Using these two relations, we compared both of proposed models to co-occurrence based ABC model.
The precision of B entities by co-occurrence based ABC model was 27.1% for “APOE–MAPT” and 22.1% for “FUS–TARDBP”, respectively. In context-based ABC model, precision of extracted B entities was 71.4% for “APOE–MAPT”, and 77.9% for “FUS–TARDBP”. Context-assignment based ABC model achieved 89% and 97.5% precision for the two relations, respectively. Both proposed models achieved a higher precision than co-occurrence-based ABC model.
With the development of modern biology, the number of publications in the biology literature has been increasing rapidly. As the size of the published literature increases, knowledge that is latent in the papers is also accumulated. Biomedical researchers increasingly have to search for the knowledge they want in a very large corpus. There has been considerable research into methods for automatically extracting knowledge from literature.
Some studies have tried to overcome the limitations of the existing co-occurrence approach by using statistical techniques or thresholds. Hristovski et al.  proposed the LBD system, BITOLA, using semantic prediction. The system is combined with BioMedLEE, a type of NLP system, and SemRep to develop a model for RE. These authors applied their method to the identification of associations between Raynaud’s disease and fish oil, as studied by Swanson. In a study by Frijters at el. , CoPub, an LBD system to find new relations between biomedical concepts was developed and used to investigate relations between genes, therapeutic drugs, signaling pathways, and diseases. Lee et al.  investigated relations between biological processes and side effects using a drug as a B entity. They constructed a multilevel network by combining a drug-biological process network and a drug-side effect network.
The co-occurrence-based ABC model is one of the models that provides new hypotheses. An intermediate entity acts as a middleman between two other entity relations when applying the ABC model. However, due to the lack of a biological context, B entity extracted by the co-occurrence-based ABC model is sometimes not useful or relevant. In order to overcome the limitations of the co-occurrence-based ABC model, this study defined biological context, proposed a method to extract context from the literature, and then applied the biological contexts to ABC model. In this study the biological contexts are defined as cell, drug, disease, and organism; places where interactions take place in living organisms, or conditions which interfere with or promote such interactions. Using biological contexts, we propose the context-based ABC model, which provides more relevant B entities than the co-occurrence-based ABC model.