ISSN :2582-9793

A Deep Learning Framework for Arabic Continuous Speech Keyword Spotting in Low-Resource Settings Using Isolated-Word Keyword Spotting and Posterior Probability Functions

Original Research (Published On: 31-Jul-2025 )
DOI : https://doi.org/10.54364/AAIML.2025.53229

Osama Deeb, Assef Jafar and Oumayma Al Dakkak

Adv. Artif. Intell. Mach. Learn., 5 (3):4074-4093

1. Osama Deeb: Higher Institute for applied Sciences and Technology

2. Assef Jafar: Higher Institute for Applied Sciences and Technology

3. Oumayma Al Dakkak: Higher Institute for Applied Sciences and Technology

Download PDF Here Citation Info via Semantic Scholar

DOI: 10.54364/AAIML.2025.53229

Article History: Received on: 04-May-25, Accepted on: 24-Jul-25, Published on: 31-Jul-25

Corresponding Author: Osama Deeb

Email: osama.deeb@hiast.edu.sy

Citation: Osama Deeb, Assef Jafar, Oumayma Al Dakkak. A Deep Learning Framework for Arabic Continuous Speech Keyword Spotting in Low-Resource Settings Using Isolated-Word Keyword Spotting and Posterior Probability Functions. Advances in Artificial Intelligence and Machine Learning. 2025; 5(3):229.


Abstract

    

Continuous Speech Keyword Spotting (CSKWS) presents a challenging paradigm shift from isolated-word Keyword Spotting (KWS), focusing on discovering the occurrences of predefined keywords within continuous speech streams. In this paper, we address the prevalent issue of data scarcity in CSKWS for low-resource languages by introducing our innovative Posterior Probability Function approach (PPF-CSKWS). Utilizing unsupervised method, this approach leverages a few-shot KWS system to derive posterior probability estimates of keyword occurrences as a discrete-time function. Contrary to the data-intensive training procedures typically associated with CSKWS system development, this method requires only 15 isolated audio samples per keyword, significantly reducing the data bottleneck. The generated posterior probability functions provide crucial temporal information, facilitating both keyword identification and localization. This characteristic allows for the reformulation of the CSKWS problem as a detection task, with evaluation metrics such as mean Average Precision (mAP) and Maximum Term Weighted Value (MTWV) being applicable. To evaluate the proposed method, a dedicated Arabic speech corpus was constructed. Experimental results demonstrated the achievement of mAP = 0.613 and MTWV = 0.641. This performance meets or exceeds that of established supervised techniques requiring significantly larger quantities of labelled training data, thereby demonstrating the potential of PPF-CSKWS in data scarcity scenarios.

Statistics

   Article View: 1935
   PDF Downloaded: 19