Husnain Saleem
Adv. Artif. Intell. Mach. Learn., 5 (2):3883-3899
1. Husnain Saleem: Gomal Research Institute of Computing (GRIC), Faculty of Computing, Gomal University, Dera Ismail Khan, KPK, Pakistan
DOI: 10.54364/AAIML.2025.52220
Article History: Received on: 07-Apr-25, Accepted on: 14-Jun-25, Published on: 21-Jun-25
Corresponding Author: Husnain Saleem
Email: jilani.husnain@yahoo.com
Citation: Husnain Saleem, et al. Performance Assessment of ML and DL Models in Detecting Hate Speech from Mixed English–Roman Urdu Text with Small-Scale Datasets. Advances in Artificial Intelligence and Machine Learning. 2025;5(2):220.
This research
evaluates hate speech detection across a minimal-size Mixed English and Roman
Urdu language intersection through machine learning and deep learning model
analysis. For traditional models, the data required text cleaning alongside
tokenization and TF-IDF vectorization to participate in the same experiment, as
deep learning models needed trainable embeddings. Experiment results between
Naive Bayes, Logistic Regression, Linear SVM, Random Forest, LSTM, BiLSTM, and
CNN demonstrated that Logistic Regression produced the greatest F1 score of
0.8073. CNN was the most effective choice among deep learning models, scoring
an F1 score of 0.7786. The research demonstrates that traditional models
perform well on small-scale datasets; however, deep learning has evolved as an
effective tool for processing code-mixed text. In the future, Researchers
should examine pre-trained embeddings, larger datasets, and advanced models to
raise detection abilities for hate speech in mixed-language text.