Osama Deeb, Assef Jafar and Oumayma Al Dakkak
Adv. Artif. Intell. Mach. Learn., 5 (3):4074-4093
1. Osama Deeb: Higher Institute for applied Sciences and Technology
2. Assef Jafar: Higher Institute for Applied Sciences and Technology
3. Oumayma Al Dakkak: Higher Institute for Applied Sciences and Technology
DOI: 10.54364/AAIML.2025.53229
Article History: Received on: 04-May-25, Accepted on: 24-Jul-25, Published on: 31-Jul-25
Corresponding Author: Osama Deeb
Email: osama.deeb@hiast.edu.sy
Citation: Osama Deeb, Assef Jafar, Oumayma Al Dakkak. A Deep Learning Framework for Arabic Continuous Speech Keyword Spotting in Low-Resource Settings Using Isolated-Word Keyword Spotting and Posterior Probability Functions. Advances in Artificial Intelligence and Machine Learning. 2025; 5(3):229.
Continuous Speech Keyword Spotting (CSKWS) presents a challenging paradigm shift from isolated-word Keyword Spotting (KWS), focusing on discovering the occurrences of predefined keywords within continuous speech streams. In this paper, we address the prevalent issue of data scarcity in CSKWS for low-resource languages by introducing our innovative Posterior Probability Function approach (PPF-CSKWS). Utilizing unsupervised method, this approach leverages a few-shot KWS system to derive posterior probability estimates of keyword occurrences as a discrete-time function. Contrary to the data-intensive training procedures typically associated with CSKWS system development, this method requires only 15 isolated audio samples per keyword, significantly reducing the data bottleneck. The generated posterior probability functions provide crucial temporal information, facilitating both keyword identification and localization. This characteristic allows for the reformulation of the CSKWS problem as a detection task, with evaluation metrics such as mean Average Precision (mAP) and Maximum Term Weighted Value (MTWV) being applicable. To evaluate the proposed method, a dedicated Arabic speech corpus was constructed. Experimental results demonstrated the achievement of mAP = 0.613 and MTWV = 0.641. This performance meets or exceeds that of established supervised techniques requiring significantly larger quantities of labelled training data, thereby demonstrating the potential of PPF-CSKWS in data scarcity scenarios.