Stable Feature Selection using Improved Whale Optimization Algorithm for Microarray Datasets

A microarray is a collection of DNA sequences that reflect an organism's whole gene set and are organized in a grid pattern for use in genetic testing. Microarray datasets are extremely high-dimensional and have a very small sample size, posing the challenges of insufficient data and high compu...

Full description

Bibliographic Details
Authors: Theng, Dipti, Bhoyar, Kishor K
Format: article
Publication Date:2023
Country:España
Institution:Universidad de Salamanca (USAL)
Repository:GREDOS. Repositorio Institucional de la Universidad de Salamanca
OAI Identifier:oai:gredos.usal.es:10366/160189
Online Access:http://hdl.handle.net/10366/160189
Access Level:Open access
Keyword:feature selection
stability of feature selection
whale optimization algorithm
marine predator algorithm
grey wolf optimization
microarray datasets
high dimensional datasets
Description
Summary:A microarray is a collection of DNA sequences that reflect an organism's whole gene set and are organized in a grid pattern for use in genetic testing. Microarray datasets are extremely high-dimensional and have a very small sample size, posing the challenges of insufficient data and high computational complexity. Identification of true biomarkers that are the most significant features (a very small subset of the complete feature set) is desired to solve these issues. This reduces over-fitting, and time complexity, and improves model generalization. Various feature selection algorithms are used for this biomarker identification. This research proposed a modification to the whale optimization algorithm (WOAm) for biomarker discovery, in which the fitness of each search agent is evaluated using the hinge loss function during the hunting for prey phase to determine the optimal search agent. Also compared the results of the proposed modified algorithm with the original whale optimization algorithm and also with contemporary algorithms like the marine predator algorithm and grey wolf optimization. All these algorithms are evaluated on six different high-dimensional microarray datasets. It has been observed that the proposed modification for the whale optimization algorithm has significantly improved the results of feature selection across all the datasets. Domain experts trust the resultant biomarker/ associated genes by the stability of the results obtained. The chosen feature set's stability was also evaluated during the research work. According to the findings, our proposed WOAm has superior stability compared to other algorithms for the CNS, colon, Leukemia, and OSCC. datasets.