A Hybrid Feature Selection Based on Fisher score and SVM-RFE for Microarray Data

Hind Hamla, Khadoudja Ghanem

Abstract


In the last two decades, analyzing microarray data plays a critical role in disease diagnosis and identification of different tumors. However, it is difficult to classify microarray data because of the curse of the dimensionality problem, in which the number of features is huge while the number of samples is small. Thus, dimension reduction techniques, such as feature selection methods, play a vital role in eliminating non-informative features and enhancing cancer classification. In this paper, we propose a Filter-embedded hybrid feature selection method for the gene selection problem. First, the proposed method selects the top-ranked features obtained from the Fisher score to provide a candidate subset for the embedded stage. Second, Support Vector Machine-Recursive Feature Elimination (SVM-RFE) applies to the candidate subset to find the optimal subset. We assess the performance of our proposed method over ten high-dimensional microarray datasets. The results reveal that the proposed method enhances the classification accuracy, reduces the number of selected features, and decreases computational time.

Full Text:

PDF

References


Muhammed Abd-Elnaby, Marco Alfonse,

and Mohamed Roushdy. Classification of

breast cancer using microarray gene expression

data: A survey. Journal of Biomedical

Informatics, 117:103764, 2021.

Russul Alanni, Jingyu Hou, Hasseeb Azzawi,

and Yong Xiang. A novel gene selection algorithm

for cancer classification using microarray

datasets. BMC medical genomics, 12(1):

–12, 2019.

Nada Almugren and Hala Alshamlan. A survey

on hybrid feature selection methods in

microarray gene expression data for cancer

classification. IEEE Access, 7:78533–78548,

Talal Almutiri and Faisal Saeed. Chi square

and support vector machine with recursive

feature elimination for gene expression data

classification. In 2019 First International

Conference of Intelligent Computing and Engineering

(ICOICE), pages 1–6. IEEE, 2019.

Veronica Bolon-Canedo and Amparo

Alonso-Betanzos. Microarray Bioinformatics.

Springer, 2019.

Veronica Bolon-Canedo, Noelia S´anchez-

Marono, Amparo Alonso-Betanzos,

Jos´e Manuel Ben´ıtez, and Francisco Herrera.

A review of microarray datasets and applied

feature selection methods. Information

Sciences, 282:111–135, 2014.

Zhipeng Cai, Randy Goebel, Mohammad R

Salavatipour, and Guohui Lin. Selecting dissimilar

genes for multi-class classification, an

application in cancer subtyping. BMC bioinformatics,

(1):1–15, 2007.

Hakan Gunduz. An efficient dimensionality

reduction method using filter-based feature

selection and variational autoencoders on

parkinson’s disease classification. Biomedical

Signal Processing and Control, 66:102452,

Isabelle Guyon, Jason Weston, Stephen

Barnhill, and Vladimir Vapnik. Gene selection

for cancer classification using support

vector machines. Machine learning, 46(1):

–422, 2002.

Hind Hamla and Khadoudja Ghanem. Comparative

study of embedded feature selection

methods on microarray data. In IFIP International

Conference on Artificial Intelligence

Applications and Innovations, pages 69–77.

Springer, 2021.

Xiaojuan Huang, Li Zhang, Bangjun Wang,

Fanzhang Li, and Zhao Zhang. Feature clustering

based support vector machine recursive

feature elimination for gene selection.

Applied Intelligence, 48(3):594–607, 2018.

Hengxun Li, Wei Guo, Guoying Wu, and

Yanxia Li. A rf-pso based hybrid feature selection

model in intrusion detection system.

In 2018 IEEE Third International Conference

on Data Science in Cyberspace (DSC),

pages 795–802. IEEE, 2018.

Zifa Li, Weibo Xie, and Tao Liu. Efficient

feature selection and classification for microarray

data. PloS one, 13(8):e0202167,

Huijuan Lu, Junying Chen, Ke Yan, Qun Jin,

Yu Xue, and Zhigang Gao. A hybrid feature

selection algorithm for gene expression

data classification. Neurocomputing, 256:56–

, 2017.

Shruti Mishra and Debahuti Mishra. Svmbt-

rfe: An improved gene selection framework

using bayesian t-test embedded in support

vector machine (recursive feature elimination)

algorithm. Karbala International

Journal of Modern Science, 1(2):86–96, 2015.

Piyushkumar A Mundra and Jagath C Rajapakse.

Svm-rfe with mrmr filter for gene

selection. IEEE transactions on nanobioscience,

(1):31–37, 2009.

Akshata Naik, Venkatanareshbabu Kuppili,

and Damodar Reddy Edla. Binary dragonfly

algorithm and fisher score based hybrid

feature selection adopting a novel fitness

function applied to microarray data.

In 2019 International Conference on Applied

Machine Learning (ICAML), pages 40–43.

IEEE, 2019.

Salima Ouadfel and Mohamed Abd Elaziz.

Efficient high-dimension feature selection

based on enhanced equilibrium optimizer.

Expert Systems with Applications, 187:

, 2022.

Beatriz Remeseiro and Veronica Bolon-

Canedo. A review of feature selection methods

in medical applications. Computers in

biology and medicine, 112:103375, 2019.

Zohre Sadeghian, Ebrahim Akbari, and Hossein

Nematzadeh. A hybrid feature selection

method based on information theory

and binary butterfly optimization algorithm.

Engineering Applications of Artificial Intelligence,

:104079, 2021.

Alok Kumar Shukla. Multi-population adaptive

genetic algorithm for selection of microarray

biomarkers. Neural Computing and

Applications, 32(15):11897–11918, 2020.

Alok Kumar Shukla, Pradeep Singh, and

Manu Vardhan. A hybrid gene selection

method for microarray recognition. Biocybernetics

and Biomedical Engineering, 38(4):

–991, 2018.

Alok Kumar Shukla, Diwakar Tripathi,

B Ramachandra Reddy, and D Chandramohan.

A study on metaheuristics approaches

for gene selection in microarray

data: algorithms, applications and open

challenges. Evolutionary Intelligence, 13(3):

–329, 2020.

Mervyn Stone. Cross-validatory choice and

assessment of statistical predictions. Journal

of the royal statistical society: Series B

(Methodological), 36(2):111–133, 1974.

Lin Sun, Xiao-Yu Zhang, Yu-Hua Qian, Jiu-

Cheng Xu, Shi-Guang Zhang, and Yun Tian.

Joint neighborhood entropy-based gene selection

method with fisher score for tumor

classification. Applied Intelligence, 49(4):

–1259, 2019.

J Yang, YL Liu, CS Feng, and GQ Zhu. Applying

the fisher score to identify alzheimer’s

disease-related genes. Genet Mol Res, 15(2),

Ge Zhang, Jincui Hou, Jianlin Wang,

Chaokun Yan, and Junwei Luo. Feature selection

for microarray data classification using

hybrid information gain and a modified

binary krill herd algorithm. Interdisciplinary

Sciences: Computational Life Sciences, 12:

–301, 2020.

Huaqing Zhang, Jian Wang, Zhanquan Sun,

Jacek M Zurada, and Nikhil R Pal. Feature

selection for neural networks using group

lasso regularization. IEEE Transactions on

Knowledge and Data Engineering, 32(4):659–

, 2019.

Xue Zhang, Zhiguo Shi, Xuan Liu, and Xueni

Li. A hybrid feature selection algorithm for

classification unbalanced data processsing.

In 2018 IEEE International Conference on

Smart Internet of Things (SmartIoT), pages

–275. IEEE, 2018.

Ying Zhang, Qingchun Deng, Wenbin Liang,

and Xianchun Zou. An efficient feature selection

strategy based on multiple support vector

machine technology with gene expression

data. BioMed research international, 2018,

Yuefeng Zheng, Ying Li, Gang Wang, Yupeng

Chen, Qian Xu, Jiahao Fan, and Xueting

Cui. Retracted: A hybrid feature selection

algorithm for microarray data. Concurrency

and Computation: Practice and Experience,

(12):e4716, 2019.

Zexuan Zhu, Yew-Soon Ong, and Manoranjan

Dash. Markov blanket-embedded genetic

algorithm for gene selection. Pattern Recognition,

(11):3236–3248, 2007.




DOI: https://doi.org/10.31449/inf.v48i1.4759

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.