Research Article: Predicting biomedical relationships using the knowledge and graph embedding cascade model

Date Published: June 13, 2019

Publisher: Public Library of Science

Author(s): Xiaomin Liang, Daifeng Li, Min Song, Andrew Madden, Ying Ding, Yi Bu, Jiajie Peng.


Advances in machine learning and deep learning methods, together with the increasing availability of large-scale pharmacological, genomic, and chemical datasets, have created opportunities for identifying potentially useful relationships within biochemical networks. Knowledge embedding models have been found to have value in detecting knowledge-based correlations among entities, but little effort has been made to apply them to networks of biochemical entities. This is because such networks tend to be unbalanced and sparse, and knowledge embedding models do not work well on them. However, to some extent, the shortcomings of knowledge embedding models can be compensated for if they are used in association with graph embedding. In this paper, we combine knowledge embedding and graph embedding to represent biochemical entities and their relations as dense and low-dimensional vectors. We build a cascade learning framework which incorporates semantic features from the knowledge embedding model, and graph features from the graph embedding model, to score the probability of linking. The proposed method performs noticeably better than the models with which it is compared. It predicted links and entities with an accuracy of 93%, and its average hits@10 score has an average of 8.6% absolute improvement compared with original knowledge embedding model, 1.1% to 9.7% absolute improvement compared with other knowledge and graph embedding algorithm. In addition, we designed a meta-path algorithm to detect path relations in the biomedical network. Case studies further verify the value of the proposed model in finding potential relationships between diseases, drugs, genes, treatments, etc. Amongst the findings of the proposed model are the suggestion that VDR (vitamin D receptor) may be linked to prostate cancer. This is backed by evidence from medical databases and published research, supporting the suggestion that our proposed model could be of value to biomedical researchers.

Partial Text

Biochemistry is a cross-discipline, incorporating elements of pharmacology, biology, and chemistry. The large number of disciplines associated with biochemistry makes it challenge to identify new relationships. Computational prediction is becoming a crucial and effective strategy for identifying links, given its potentials to reduce the high failure risk of expensive and time-consuming laboratory experiments.

This paper have proposed a cascade model for biochemical link predictions. The novelty of the model lies in the combination of semantic features and graph features in a multi-relational biochemical data set in which the semantic features derive from the translation-based knowledge embedding model and the graph features are learned from the mainstream graph embedding model Node2vec. The translation-based knowledge embedding model is efficient and scalable and the embedding vectors from Node2vec encode local and global topological information. The cascade learning model uses series functions to relocate the triplets in the feature space and achieves noticeable improvements.