Ayad Hameed Mousa
Adv. Artif. Intell. Mach. Learn., XX (XX):-
1. Ayad Hameed Mousa: University of Kerbala
DOI: 10.54364/AAIML.2026.63310
Article History: Received on: 04-Feb-26, Accepted on: 26-May-26, Published on: 02-Jun-26
Corresponding Author: Ayad Hameed Mousa
Email: ayad.h@uokerbala.edu.iq
Citation: Ayad Hameed Mousa, et al. A Health Information Technology Framework: Interpretable Voting Ensembles for Thalassemia Screening Using Clinical and Hematological Biomarkers. Advances in Artificial Intelligence and Machine Learning. 2026. (Ahead of Print) https://dx.doi.org/10.54364/AAIML.2026.63310
Thalassemia is a severe hereditary blood disorder, where early and
accurate detection significantly affects the patient's treatment. Machine
learning tools have the potential to make diagnostic screening more efficient;
however, their lack of transparency usually referred to as a "black
box" makes a large number of clinicians reluctant to use them in practice.
We developed a feasible health IT framework centered on interpretable voting
ensembles for thalassemia screening using standard clinical and hematological
markers. The dataset we used was from the real world and had 25 health
attributes such as (hemoglobin levels, red blood cell indices, ferritin). We
then trained three machine learning models: XGBoost, Random Forest, and
Logistic Regression. The SMOTE technique was utilized to deal with class
imbalance in the data. We found that a soft voting ensemble of these models
outperformed individual models, achieving an accuracy of 98. 37% and an
F1-score of 96. 99%. To open the black box and instill confidence in the
models’ decisions, SHAP (SHapley Additive exPlanations) was used to provide
explanations for each prediction in terms understandable by clinicians. These
interpretations were then evaluated and endorsed by the clinicians who
participated, and thus, ensuring compatibility with the diagnostic reasoning in
the real world. The most significant predictors MCV, MCH, and patient age were
identified. Building the ensemble took more computational time.