An Elastic Net Approach to Logistic Regression for Genetic Selection in High-Dimensional Brain Cancer Data
DOI:
https://doi.org/10.24086/cuesj.v9n1y2025.pp14-23Keywords:
Brian Cancer, Elastic net, Regularization Techniques, Gene Selection, Multinomial Logistic Model, High-Dimensional DataAbstract
The study explores issues related to the treatment of brain cancer caused by the heterogeneous nature of different variants of brain tumors. The objective of this study was to identify essential genes present in multiple types of brain cancer by using high dimensional gene expression data available on the Curated Microarray Database CuMiDa. The study’s dataset comprised a total of 130 samples belonging to 4 subtypes of brain cancer and 16384 gene expression variables. Thus, the penalized Elastic net method in conjunction with Multinomial Logistic Regression was used to cope with curse of dimensionality problems. Then, accuracy, Kappa statistic, Area Under the Curve, and F1-score were utilized to evaluate measures of the model efficiency. Elastic Net proved to be quite effective in the sense of the extensiveness of the variables included in the analysis and successfully restricted gene level further analysis as well as highlighted subtype specific expression signatures. The model achieved high precision and AUC values indicating that in general the model had good ability to distinguish all subtypes with some around perfect score of AUC. Robust parameter estimation was supplemented with cross validation and other predictive model validation statistical techniques done in R language programming. Thus, these findings suggest that the best model for evaluating large-scale gene expression data of brain cancers is the use of MLR with an elastic net regularization. There is ample evidence that these selected genes contribute to and serve as targets for therapy, therefore making this study a good starting point for further investigations with respect to understanding their biological role. The corresponding model is also to be applied to test its validity on some other datasets of a quite different nature. This, in turn, may suggest improved diagnostic, prognostic and therapeutic options for the brain tumor.
Downloads
References
D. N. Louis, A. Perry, P. Wesseling, D. J. Brat, I. A. Cree, D. Figarella-Branger, C. Hawkins, H. K. Ng, S. M. Pfister, G. Reifenberger, R. Soffietti, A. Von Deimling and D. W. Ellison. The 2021 WHO classification of tumors of the central nervous system: A summary. Neuro-Oncology, vol. 23, no. 8, pp. 1231-1251, 2021.
Q. T. Ostrom, N. Patil, G. Cioffi, K. Waite, C. Kruchko and J. S. Barnholtz-Sloan. CBTRUS statistical report: Primary brain and other central nervous system tumors diagnosed in the United States in 2013-2017. Neuro-Oncology, vol. 22, no. Suppl 1, pp.iv1-iv96, 2020.
Y. Ma and Z. Xi. Integrated analysis of multiomics data identified molecular subtypes and oxidative Stress-Related prognostic biomarkers in Glioblastoma multiforme. Oxidative Medicine and Cellular Longevity, vol. 2022, pp. 1-15, 2022.
M. Ahdesmäki and K. Strimmer. Feature selection in omics prediction problems using cat scores and false nondiscovery rate control. The Annals of Applied Statistics, vol. 4, no. 1, pp. 503-5192010.
T. Hastie, T. Robert and J. Friedman. The elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. 2nded., Vol. 10. Springer, Germany; 2009. p. 0387848576.
D. W. Hosmer Jr., S. Lemeshow and R. X. Sturdivant. Applied Logistic Regression. Wiley, United States; 2013.
Pearson Deutschland. Econometric Analysis. Pearson eLibrary; 2019. Available from: https://elibrary.pearson.de/ book/99.150005/9781292231150 [Last accessed on 2024 Jun10].
N. Mahmood, R. Yahya and S. Aziz. Apply binary logistic regression model to recognize the risk factors of diabetes through measuring glycated hemoglobin levels. CUESJ, vol. 6, no. 1, pp. 7-11, 2022.
P. Bühlmann and S. Van De Geer. Statistics for High-Dimensional Data. Springer, Germany, 2011.
K. P. Vatcheva, M. Lee, J. B. McCormick and M. H. Rahbar. Multicollinearity in regression analyses conducted in epidemiologic studies. Epidemiology (Sunnyvale), vol. 6, no. 2, p. 227, 2016.
H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B (Statistical Methodology), vol. 67, no. 2, pp. 301–320, 2005.
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Statistical Methodology), vol. 58, no. 1, pp. 267-288, 1996.
N. H. Mahmood, D. H. Kadir, R. O. Yahya and H. Q. Birdawod. The significance of delivery methods and fetal gender in reducing stillbirth rate: Using the generalized regression model. Clinical Epidemiology and Global Health, vol. 29, p. 101710, 2024.
J. Friedman, T. Hastie and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, vol. 33, no. 1, pp. 1-22, 2010.
J. O. Ogutu, T. Schulz-Streeck and H. P. Piepho. Genomic selection using regularized linear regression models: Ridge regression, lasso, elastic net and their extensions. BMC Proceedings, vol. 6, no. S2, p. S10, 2012.
N. Mahmood. Sparse Ridge Fusion for Linear Regression. STARS, 2013. Available from: https://stars.library.ucf.edu/etd/2767 [Last accessed on 2024 Jul 03].
T. Hastie, R. Tibshirani and M. Wainwright. Statistical Learning with Sparsity. CRC Press, United States, 2015.
M. Ceccarelli, F. P. Barthel, T. M. Malta, T. S. Sabedot, S. R. Salama, B. A. Murray, T. S. Sabedot, B. A. Murray, O. Morozova,… & Y. Newton. Molecular profiling reveals biologically discrete subsets and pathways of progression in diffuse glioma. Cell, vol. 164, no. 3, pp. 550-563, 2016.
C. Neftel, J. Laffy, M. G. Filbin, T. Hara, M. E. Shore, G. J. Rahme, A. R. Richman, M. E. Shoreet and G. J. Rahmeal. An integrative model of cellular states, plasticity, and genetics for glioblastoma. Cell, vol. 178, no. 4, pp. 835-849.e21, 2019.
Y. Zhang, P. K. S. Ng, M. Kucherlapati, F. Chen, T. Liu, Y. H. Tsang, G. De Velasco, K. J. Jeong and R. Akbani. A pan-cancer proteogenomic atlas of PI3K/AKT/MTOR pathway alterations. Cancer Cell, vol. 31, no. 6, pp. 820-832.e3, 2017.
G. James, D. Witten, T. Hastie and R. Tibshirani. An Introduction to Statistical Learning. Springer, Germany, 2021.
R. Tibshirani, M. Saunders, S. Rosset, J. Zhu and K. Knight. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 1, pp. 91-108, 2005.
J. A. M. Pérez and P. S. P. Martín. Regresión logística. Medicina De Familia Semergen, vol. 50, no. 1, p. 102086, 2023.
N. H. Mahmood, S. H. Murad and K. K. Kakamad. Ordinal logistic regression for students academic performance in Kurdistan region of Iraq. Information Management and Business Review, vol. 10, no. 2, pp. 17-22, 2018.
J. E. Yoo. Penalized Regression in Large-Scale Data Analysis. Springer, Singapore, pp. 71-91, 2024.
C. Wang, N. Li, H. Diao and L. Lu. Variable selection through adaptive elastic net for proportional odds model. Japanese Journal of Statistics and Data Science, vol. 7, no. 1, pp. 203-221, 2024.
L. Liu, J. Gao, G. Beasley and S. H. Jung. LASSO and elastic net tend to over-select features. Mathematics, vol. 11, no. 17, p. 3738, 2023.
J. Balayla. Prevalence Threshold and bounds in the Accuracy of Binary Classification Systems. Cornell University, New York, 2021.
P. Christen, D. J. Hand and N. Kirielle. A review of the F-measure: Its history, properties, criticism, and alternatives. ACM Computing Surveys, vol. 56, no. 3, pp. 1-24, 2023.
M. Weller, W. Wick, K. Aldape, M. Brada, M. Berger, S. M. Pfister, R. Nishikawa, M. Rosenthal, P. Y. Wen, R. Stupp and G. Reifenberger. Glioma. Nature Reviews Disease Primers, vol. 1, no. 1, p. 15017, 2015.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Nozad H. Mahmood, Dler H. Kadir

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-ND 4.0] that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Accepted 2024-12-20
Published 2025-01-20



