ISSN :2582-9793

Sentiment Analysis: A Systematic Case Study with Yelp Scores

Original Research (Published On: 02-Aug-2023 )
DOI : https://doi.org/10.54364/AAIML.2023.1174

Wenping Wang, Jin Han, Chen Liang, Tong Chen, Chengze Fan and Jingxian huang

Adv. Artif. Intell. Mach. Learn., 3 (3):1259-1273

1. Wenping Wang: Individual Researcher

2. Jin Han: Amazon Inc, 410 Terry Ave N, Seattle 98109, WA, USA

3. Chen Liang: Google Inc, 1600 Amphitheatre Parkway, USA

4. Tong Chen: Google Inc, 1600 Amphitheatre Parkway, USA

5. Chengze Fan: Meta Platforms, 1 Hacker Way, USA

6. Jingxian huang: Meta Platforms, 1 Hacker Way, USA

Download PDF Here Citation Info via Semantic Scholar

DOI: 10.54364/AAIML.2023.1174

Article History: Received on: 26-May-23, Accepted on: 25-Jul-23, Published on: 02-Aug-23

Corresponding Author: Wenping Wang

Email: wenpingw@alumni.cmu.edu

Citation: Wenping Wang, et al. Sentiment Analysis: A Systematic Case Study with Yelp Scores. Advances in Artificial Intelligence and Machine Learning. 2023;3 (3):74.


Abstract

    

Sentiment Analysis is a classic and well-defined task for machine learning and natural language processing. Over the years, we have seen much progress in machine learning as a whole and in natural language processing. Given that in commercial applications, we are heavily constrained by cost, throughput and latency, we wonder how better accuracy can be brought about by using complex, high-latency models, than easy, low-latency models that can be deployed in embedded devices and in high throughput scenarios. In this article, we focus on the Yelp Review dataset as a test bench. By predicting Yelp overall ratings based on user review text and other related features, we experiment with various existing machine learning algorithms, from easy logistic regression to BERT embedding-based deep models. We also use ensemble to combine the aforementioned models into a single predictor, seeing if a combination of these models will achieve better performance. Among all the models, we can see that a simple TF-IDF baseline with MLP ensemble can reach an accuracy higher than pure MLP models, proving that in a production scenario, we may be able to emphasize throughput and latency by using small models, instead of relying on heavy, multi-layer MLPs, with proper vectorizer and data processing.

Statistics

   Article View: 1504
   PDF Downloaded: 17