A modification of the Lasso method by using the Bahadur representation for the genome-wide association study

Lev V. Utkin, Yulia A. Zhuk


A modification of the Lasso method as a powerful machine learning tool applied to a genome-wide association study is proposed in the paper. From the machine learning point of view, a feature selection problem is solved in the paper, where features are single nucleotide polymorphisms or DNA-markers whose association with a quantitative trait is established. The main idea underlying the modification is to take into account correlations between DNA-markers and peculiarities of phenotype values by using the Bahadur representation of joint probabilities of binary random variables. Interactions of DNA-markers called the epistasis are also considered in the framework of the proposed modification. Various numerical experiments with real datasets illustrate the proposed modification.

