Date Published: February 14, 2019
Publisher: Public Library of Science
Author(s): Sai Wang, Hai-Wei Shen, Hua Chai, Yong Liang, Suzannah Rutherford.
For studying cancer and genetic diseases, the issue of identifying high correlation genes from high-dimensional data is an important problem. It is a great challenge to select relevant biomarkers from gene expression data that contains some important correlation structures, and some of the genes can be divided into different groups with a common biological function, chromosomal location or regulation. In this paper, we propose a penalized accelerated failure time model CHR-DE using a non-convex regularization (local search) with differential evolution (global search) in a wrapper-embedded memetic framework. The complex harmonic regularization (CHR) can approximate to the combination ℓp(12≤p<1) and ℓq (1 ≤ q < 2) for selecting biomarkers in group. And differential evolution (DE) is utilized to globally optimize the CHR’s hyperparameters, which make CHR-DE achieve strong capability of selecting groups of genes in high-dimensional biological data. We also developed an efficient path seeking algorithm to optimize this penalized model. The proposed method is evaluated on synthetic and three gene expression datasets: breast cancer, hepatocellular carcinoma and colorectal cancer. The experimental results demonstrate that CHR-DE is a more effective tool for feature selection and learning prediction.
Feature selection is a great step forward for selecting biomarkers in biological data with high dimension and small sample. Among various kinds of feature selection methods, the regularization methods use different penalty functions embedded in the learning procedure into a single process and has lower risk to over-fitting. The well known penalty is the least absolute shrinkage and selection operator (Lasso, ℓ1-norm) , which is performing continuous shrinkage and feature selection at the same time. Other ℓ1-norm type regularization methods typically include smoothly clipped absolute deviation (SCAD) , group lasso , minimax concave penalty (MCP) , etc. Besides, Xu et al  has proved that when 0