Charanpreet Kaur and Rosy Madaan
Adv. Artif. Intell. Mach. Learn., 6 (1):5023-5048
1. Charanpreet Kaur: Manav Rachna International Institute Of Research And Studies, Faridabad, Haryana
2. Rosy Madaan: Manav Rachna International Institute Of Research And Studies, Faridabad, Haryana
DOI: 10.54364/AAIML.2026.61279
Article History: Received on: 18-Nov-25, Accepted on: 12-Feb-26, Published on: 19-Feb-26
Corresponding Author: Charanpreet Kaur
Email: charanpreet27@gmail.com
Citation: Charanpreet Kaur and Rosy Madaan. Machine Learning and Statistical Approaches for Predicting Breast Cancer Recurrence and Metastasis: A Systematic Review. Advances in Artificial Intelligence and Machine Learning. 2026;6(1):279. https://dx.doi.org/10.54364/AAIML.2026.61279
Metastasis and recurrence of breast cancer are the most
important factors that are affecting patients' quality of life and long-term
survival globally. The development of machine learning (ML) and statistical
models have been extensively used in the healthcare sector in predicting
recurrence and metastatic risk. This has been facilitated by enhancements in
data accessibility via various datasets from public cancer repositories and
population-based registries. The validity, comparability, and clinical utility
of these prediction models have not been comprehensively integrated, and the
data remains disjointed despite considerable methodological advancements. This
systematic review aims to critically evaluate the machine learning and
statistical methodologies employed to predict breast cancer recurrence and
metastasis utilizing secondary data, concentrating on data sources, modeling
techniques, outcome definitions, validation strategies, and reported clinical
utility.
Using PubMed, Scopus, Web of Science, and IEEE Xplore, a complete literature
review was done according to the PRISMA principles. In order to create and test
the different ways to predict the re-occurrence and spread of breast cancer,
secondary datasets like TCGA, METABRIC, SEER, GEO, and other cancer registries
were used. The data extraction mechanism included study's design, the
characteristics of the cohort, the predictor variables, the modeling
strategies, the performance measures, and the validation techniques.
The study included a wide range of statistical models, like Cox proportional
hazards models and logistic regression, diverse ML techniques such as random
forests, support vector machines, gradient boosting, and deep learning
architectures. Most of the models under study combined the genetic or
transcriptome attributes with the clinicopathological factors whereas only a few
explored the multi-modal or image-based methods. The predictive performance
varied substantially across different studies, with frequent reliance on the
internal validation and limited use of external or prospective validation. The
analyses of the review showed that the results for recurrence and metastasis in
breast cancer were heterogeneous, mainly in registry-based studies,
complicating the cross-study comparison. While most of the models showed medium
to high discriminatory capability, but calibration, precision, accuracy and
clinical significance were rarely documented. Machine learning and statistical
algorithms showed a great future in predicting the recurrence and metastasis of
breast cancer using the secondary data, even though their application is
limited by methodological errors, inadequate validation, and unclear clinical
relevance. To enable responsible and successful adoption of these strategies in
early cancer diagnosis, the future research should prioritize consistent
outcome definitions, open reporting, thorough validation from outside, and
integration of medical decision-making considerations.