ISSN :2582-9793

Performance Assessment of ML and DL Models in Detecting Hate Speech from Mixed English–Roman Urdu Text with Small-Scale Datasets

Original Research (Published On: 21-Jun-2025 )
DOI : https://doi.org/10.54364/AAIML.2025.52220

Husnain Saleem

Adv. Artif. Intell. Mach. Learn., 5 (2):3883-3899

1. Husnain Saleem: Gomal Research Institute of Computing (GRIC), Faculty of Computing, Gomal University, Dera Ismail Khan, KPK, Pakistan

Download PDF Here Citation Info via Semantic Scholar

DOI: 10.54364/AAIML.2025.52220

Article History: Received on: 07-Apr-25, Accepted on: 14-Jun-25, Published on: 21-Jun-25

Corresponding Author: Husnain Saleem

Email: jilani.husnain@yahoo.com

Citation: Husnain Saleem, et al. Performance Assessment of ML and DL Models in Detecting Hate Speech from Mixed English–Roman Urdu Text with Small-Scale Datasets. Advances in Artificial Intelligence and Machine Learning. 2025;5(2):220.


Abstract

    

This research evaluates hate speech detection across a minimal-size Mixed English and Roman Urdu language intersection through machine learning and deep learning model analysis. For traditional models, the data required text cleaning alongside tokenization and TF-IDF vectorization to participate in the same experiment, as deep learning models needed trainable embeddings. Experiment results between Naive Bayes, Logistic Regression, Linear SVM, Random Forest, LSTM, BiLSTM, and CNN demonstrated that Logistic Regression produced the greatest F1 score of 0.8073. CNN was the most effective choice among deep learning models, scoring an F1 score of 0.7786. The research demonstrates that traditional models perform well on small-scale datasets; however, deep learning has evolved as an effective tool for processing code-mixed text. In the future, Researchers should examine pre-trained embeddings, larger datasets, and advanced models to raise detection abilities for hate speech in mixed-language text.

Statistics

   Article View: 1858
   PDF Downloaded: 20