ISSN :2582-9793

A Health Information Technology Framework: Interpretable Voting Ensembles for Thalassemia Screening Using Clinical and Hematological Biomarkers

Original Research (Published On: 02-Jun-2026 )
DOI : https://doi.org/10.54364/AAIML.2026.63310

Ayad Hameed Mousa

Adv. Artif. Intell. Mach. Learn., XX (XX):-

1. Ayad Hameed Mousa: University of Kerbala

Download PDF Here

DOI: 10.54364/AAIML.2026.63310

Article History: Received on: 04-Feb-26, Accepted on: 26-May-26, Published on: 02-Jun-26

Corresponding Author: Ayad Hameed Mousa

Email: ayad.h@uokerbala.edu.iq

Citation: Ayad Hameed Mousa, et al. A Health Information Technology Framework: Interpretable Voting Ensembles for Thalassemia Screening Using Clinical and Hematological Biomarkers. Advances in Artificial Intelligence and Machine Learning. 2026. (Ahead of Print) https://dx.doi.org/10.54364/AAIML.2026.63310


Abstract

    

Thalassemia is a severe hereditary blood disorder, where early and accurate detection significantly affects the patient's treatment. Machine learning tools have the potential to make diagnostic screening more efficient; however, their lack of transparency usually referred to as a "black box" makes a large number of clinicians reluctant to use them in practice. We developed a feasible health IT framework centered on interpretable voting ensembles for thalassemia screening using standard clinical and hematological markers. The dataset we used was from the real world and had 25 health attributes such as (hemoglobin levels, red blood cell indices, ferritin). We then trained three machine learning models: XGBoost, Random Forest, and Logistic Regression. The SMOTE technique was utilized to deal with class imbalance in the data. We found that a soft voting ensemble of these models outperformed individual models, achieving an accuracy of 98. 37% and an F1-score of 96. 99%. To open the black box and instill confidence in the models’ decisions, SHAP (SHapley Additive exPlanations) was used to provide explanations for each prediction in terms understandable by clinicians. These interpretations were then evaluated and endorsed by the clinicians who participated, and thus, ensuring compatibility with the diagnostic reasoning in the real world. The most significant predictors MCV, MCH, and patient age were identified. Building the ensemble took more computational time.

Statistics

   Article View: 53
   PDF Downloaded: 2