Identifying Potential Biomarkers for Diseases Diagnosis Through Co-Expression Analysis: An Optimization Approach
Abstract
While gene function is dysregulated in cancer, detecting these abnormalities will assist in diagnosis. DNA microarray technology is a significant tool for conducting research in functional genomics. This technology has been developed to assess gene expression levels across different samples. It has been used extensively in cancer research, where mutations may switch off or increase gene expression level in malignant cells. Identifying clusters of co-expressed genes has emerged as a pivotal stage in comprehending functional genomics, as it aligns with the notion that genes with related functions often exhibit similar expression patterns across varied samples. The biologist starts by analyzing the known functions of genes within each cluster in order to infer the function of the entire cluster, this inferred function is then ascribed to all unknown genes within the respective cluster. High-dimensional clustering has proven to be a fruitful pursuit for identifying co-expressed genes. This optimization problem, which is non-convex in nature, has been demonstrated to be NP-hard. DNA microarray provides large amount of gene expression datasets, resulting in millions of measurements. Practically, when there is a greater quantity of datasets to cluster and a larger number of clusters to consider, the potential number of partitions increases significantly. Consequently, this presents a computationally intensive and time-consuming combinatorial challenge, exacerbated by the high-dimensional nature of the gene expression datasets. Despite the availability of numerous high-dimensional clustering algorithms, there remains room for improving quality and reducing running-time. Indeed, the selection of a clustering algorithm is contingent upon the specific attributes of the dataset. To that end, we have proposed an algorithm specifically tailored to deal with big and high-dimensional datasets that optimizes the computational complexity. By applying this algorithm several times, a set of clusters including genes that are grouped together across multiple runs, will emerge. The centroid of each emerged cluster will be used to identify the optimal partition. Empirical studies unequivocally demonstrate an average 48% improvement in quality and an average 60% reduction in running-time compared to the approaches outlined in the related-work section.
Full Text:
PDFReferences
Arash Kianianmomeni: “More light behind gene expression”. Trends in Plant Science. Volume 19, Issue 8, 2014, Pages 488-490. https://doi.org/10.1016/j.tplants.2014.05.004
Daniel Castro-Roa, Nikolay Zenkin: “Methodology for the analysis of transcription and translation in transcription-coupled-to-translation systems in vitro”. Methods. Volume 86, 2015, Pages 51-59. https://doi.org/10.1016/j.ymeth.2015.05.029
Rui Dilão: “The regulation of gene expression in eukaryotes: Bistability and oscillations in repressilator models”. Journal of Theoretical Biology. Volume 340, 2014, Pages 199-208. https://doi.org/10.1016/j.jtbi.2013.09.010
Ting Xi, Guizhi Zhang: “Epigenetic regulation on the gene expression signature in esophagus adenocarcinoma”. Pathology - Research and Practice. Volume 213, Issue 2, 2017, Pages 83-88. https://doi.org/10.1016/j.prp.2016.12.007
Wanzhen Li, Yulong Wang, Jianyu Zhu, Zhangxun Wang, Guiliang Tang, Bo Huang: “Differential DNA methylation may contribute to temporal and spatial regulation of gene expression and the development of mycelia and conidia in entomopathogenic fungus Metarhizium robertsii”. Fungal Biology. Volume 121, Issue 3, 2017, Pages 293-303. https://doi.org/10.1016/j.funbio.2017.01.002
Abootaleb Sedighi, Paul C.H. Li: “Challenges and Future Trends in DNA Microarray Analysis”. Comprehensive Analytical Chemistry. Volume 63, 2014, Pages 25-46. https://doi.org/10.1016/B978-0-444-62651-6.00002-7
Jinzeng Wang, Qi Lv, Xujuan Li, Ya Liu, Kang Liu, Haiyun Wang: “Post-transcriptional and translational regulation modulates gene co-expression behavior in more synchronized pace to carry out molecular function in the cell”. Gene. Volume 587, Issue 2, 2016, Pages 163-168. https://doi.org/10.1016/j.gene.2016.04.055
Alex Rodriguez, Alessandro Laio: "Clustering by fast search and find of density peaks". Science. Volume 344, Issue 6191, Pages 1492-1496, 2014. https://doi.org/10.1126/science.1242072
Ruijia Li, Xiaofei Yang, Xiaolong QIN, William Zhu: “Local gap density for clustering highdimensional data with varying densities”. Knowledge-Based Systems. Volume 184, 2019. http://dx.doi.org/10.1016/j.knosys.2019.104905
Yizhang Wang, Di Wang, You Zhou, Xiaofeng Zhang, Chai Quek: “VDPC: Variational density peak clustering algorithm”. Information Sciences. Volume 621, 2023, Pages 627-651. https://doi.org/10.1016/j.ins.2022.11.091
Hassan Ismkhan: “I-k-means−+: An iterative clustering algorithm based on an enhanced version of the k-means”. Pattern Recognition. Volume 79, 2018, Pages 402-413. https://doi.org/10.1016/j.patcog.2018.02.015
Yewang Chen, Shengyu Tang, Nizar Bouguila, Cheng Wang, Jixiang Du, HaiLin Li: “A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data”. Pattern Recognition. Volume 83, 2018, Pages 375-387. https://doi.org/10.1016/j.patcog.2018.05.030
Masciari E, Mazzeo GM, Zaniolo C: “Analysing microarray expression data through effective clustering”. Information Sciences. Volume 262, 2014, Pages 32-45. https://doi.org/10.1016/j.ins.2013.12.003
Harun Pirim, Burak Ekşioğlu, Andy Perkins, Cetin Yüceer: “Clustering of high throughput gene expression data”. Computers & Operations Research. Volume 39, Issue 12, 2012, Pages 3046-3061. https://doi.org/10.1016/j.cor.2012.03.008
www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE19429
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE19nnn/GSE19429/matrix/GSE19429_series_matrix.txt.gz
DOI: https://doi.org/10.31449/inf.v48i19.6207
This work is licensed under a Creative Commons Attribution 3.0 License.