GEETANJALI TYAGI, Parneeta Dhaliwal, Goldie Gabrani and Atul Mishra
Adv. Artif. Intell. Mach. Learn., XX (XX):-
1. GEETANJALI TYAGI: Department of Computer Science and Technology,Manav Rachna University, Faridabad, Haryana, 121004,India
2. Parneeta Dhaliwal: Department of Computer Science and Technology, Manav Rachna University, Faridabad, Haryana, 121004, India
3. Goldie Gabrani: Department of Computer Science and Technology, Jaypee Institute of information Technology, Noida, Uttar Pradesh- 201309, India
4. Atul Mishra: School of Engineering and Technology, BML Munjal University, Haryana-122413, India.
DOI: 10.54364/AAIML.2026.63304
Article History: Received on: 22-Feb-26, Accepted on: 08-May-26, Published on: 15-May-26
Corresponding Author: GEETANJALI TYAGI
Email: codingfundads@gmail.com
Citation: Geetanjali Tyagi, et al. Comparative Analysis of Traditional and Deep Learning Approaches for E-Commerce Product Recommendations: A Study on Amazon Dataset. Advances in Artificial Intelligence and Machine Learning.2026. (Ahead of Print) https://dx.doi.org/10.54364/AAIML.2026.63304
The need for accurate and efficient product recommendation systems on online e-commerce websites is on the rise. While there exist many methods to address this problem, not much literature focuses on an experiment that tests the efficacy of these methods in a standardized way. This paper provides a thorough examination of six recommendation algorithms, including those based on similarity metrics and deep learning techniques. The experiment employs the use of Amazon M2 Multilingual Shopping Session Dataset, which contains 3.6 million sessions in six languages with a total of 1.5 million products. The data used for this analysis is based on the UK region, containing 1.18 million sessions and 494,409 products. Due to efficiency and representativeness reasons, 20,000 out of 494,409 UK products (4%) were chosen as a representative dataset, consisting of structural attributes like title, brand, price, descriptions, and category. The dataset was divided into training (80%, 16,000 products) and testing (20%, 4,000 products) sets using stratified sampling. Ground-truth recommendations were curated by domain experts through a systematic process using KNNbased reference recommendations and cross-method validation, enabling thorough comparisons based on accuracy metrics (precision, recall, F1-score, NDCG) and computational complexity (training time and inference speed). Three conventional algorithms—Cosine Similarity, K-Nearest Neighbors (KNN), and Jaccard Similarity—were compared with three deep learning models—Autoencoder, Bernoulli Restricted Boltzmann Machine (RBM), and Autoencoder with attention mechanism. Results show that Autoencoder achieves the highest F1-score (0.909) and precision (0.930), while Autoencoder with attention achieves the highest precision (0.950) but lower recall (0.800). KNN achieves the highest recall (0.870), with an F1-score of 0.806 and NDCG of 0.850. Cosine Similarity achieves the best ranking quality with NDCG of 0.890 and a balanced F1-score of 0.768.