Polewko-Klim, A., Grablis, P., Rudnicki, W. (2024). EnsembleFS: an R Toolkit and a Web-Based Tool for a Filter Ensemble Feature Selection of Molecular Omics Data.
In: Franco, L., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2024. ICCS 2024. Lecture Notes in Computer Science, vol 14835. Springer, Cham. https://doi.org/10.1007/978-3-031-63772-8_7
The development of more complex biomarker selection protocols based on the machine learning (ML) approach, with additional processing of information from biological databases (DB), is important for the accelerated development of molecular diagnostics and therapy.In this study, we present EnsembleFS user-friendly R toolkit (R package and Shiny web application) for heterogeneous ensemble feature selection (EFS) of molecular omics data that also supports users in the analysis and interpretation of the most relevant biomarkers. EnsembleFS is based on five feature filters (FF), namely, U-test, minimum redundancy maximum relevance (MRMR), Monte Carlo feature selection (MCFS), and multidimensional feature selection (MDFS) in 1D and 2D versions. It uses supervised ML methods to evaluate the quality of the set of selected features and retrieves the biological characteristics of biomarkers online from the nine DB, such as Gene Ontology, WikiPathways, and Human Protein Atlas. The functional modules to identify potential candidate biomarkers, evaluation, comparison, analysis, and visualization of model results make EnsembleFS a useful tool for selection, random forest (RF) binary classification, and comprehensive biomarker analysis.