ISSN :2582-9793

Machine Learning and Statistical Approaches for Predicting Breast Cancer Recurrence and Metastasis: A Systematic Review

Review Article (Published On: 19-Feb-2026 )
DOI : https://doi.org/10.54364/AAIML.2026.61279

Charanpreet Kaur and Rosy Madaan

Adv. Artif. Intell. Mach. Learn., 6 (1):5023-5048

1. Charanpreet Kaur: Manav Rachna International Institute Of Research And Studies, Faridabad, Haryana

2. Rosy Madaan: Manav Rachna International Institute Of Research And Studies, Faridabad, Haryana

Download PDF Here

DOI: 10.54364/AAIML.2026.61279

Article History: Received on: 18-Nov-25, Accepted on: 12-Feb-26, Published on: 19-Feb-26

Corresponding Author: Charanpreet Kaur

Email: charanpreet27@gmail.com

Citation: Charanpreet Kaur and Rosy Madaan. Machine Learning and Statistical Approaches for Predicting Breast Cancer Recurrence and Metastasis: A Systematic Review. Advances in Artificial Intelligence and Machine Learning. 2026;6(1):279. https://dx.doi.org/10.54364/AAIML.2026.61279


Abstract

    

Metastasis and recurrence of breast cancer are the most important factors that are affecting patients' quality of life and long-term survival globally. The development of machine learning (ML) and statistical models have been extensively used in the healthcare sector in predicting recurrence and metastatic risk. This has been facilitated by enhancements in data accessibility via various datasets from public cancer repositories and population-based registries. The validity, comparability, and clinical utility of these prediction models have not been comprehensively integrated, and the data remains disjointed despite considerable methodological advancements. This systematic review aims to critically evaluate the machine learning and statistical methodologies employed to predict breast cancer recurrence and metastasis utilizing secondary data, concentrating on data sources, modeling techniques, outcome definitions, validation strategies, and reported clinical utility.
Using PubMed, Scopus, Web of Science, and IEEE Xplore, a complete literature review was done according to the PRISMA principles. In order to create and test the different ways to predict the re-occurrence and spread of breast cancer, secondary datasets like TCGA, METABRIC, SEER, GEO, and other cancer registries were used. The data extraction mechanism included study's design, the characteristics of the cohort, the predictor variables, the modeling strategies, the performance measures, and the validation techniques.
The study included a wide range of statistical models, like Cox proportional hazards models and logistic regression, diverse ML techniques such as random forests, support vector machines, gradient boosting, and deep learning architectures. Most of the models under study combined the genetic or transcriptome attributes with the clinicopathological factors whereas only a few explored the multi-modal or image-based methods. The predictive performance varied substantially across different studies, with frequent reliance on the internal validation and limited use of external or prospective validation. The analyses of the review showed that the results for recurrence and metastasis in breast cancer were heterogeneous, mainly in registry-based studies, complicating the cross-study comparison. While most of the models showed medium to high discriminatory capability, but calibration, precision, accuracy and clinical significance were rarely documented. Machine learning and statistical algorithms showed a great future in predicting the recurrence and metastasis of breast cancer using the secondary data, even though their application is limited by methodological errors, inadequate validation, and unclear clinical relevance. To enable responsible and successful adoption of these strategies in early cancer diagnosis, the future research should prioritize consistent outcome definitions, open reporting, thorough validation from outside, and integration of medical decision-making considerations.

Statistics

   Article View: 152
   PDF Downloaded: 2