ISSN :2582-9793

Sliding-BERT: Striding Towards Conversational Machine Comprehension in Long Context

Original Research (Published On: 19-Aug-2023 )
DOI : https://doi.org/10.54364/AAIML.2023.1178

Wenping Wang, Longxiang Zhang, Keyi Yu, Jingxian Huang, Qi Lyu, Haoru Xue and Congrui Hetang

Adv. Artif. Intell. Mach. Learn., 3 (3):1325-1339

1. Wenping Wang: Individual Researcher

2. Longxiang Zhang: Monroe College, 434 Main St, New Rochelle, NY 10801 USA

3. Keyi Yu: Google Inc, 1600 Amphitheatre Parkway USA

4. Jingxian Huang: Meta Platforms, 1 Hacker Way USA

5. Qi Lyu: Michigan State University, 426 Auditorium Road East Lansing, MI 48824 USA

6. Haoru Xue: University of California, San Diego, 9500 Gilman Drive La Jolla, CA 92093 USA

7. Congrui Hetang: Google Inc, 1600 Amphitheatre Parkway USA

Download PDF Here Citation Info via Semantic Scholar

DOI: 10.54364/AAIML.2023.1178

Article History: Received on: 01-Jun-23, Accepted on: 12-Aug-23, Published on: 19-Aug-23

Corresponding Author: Wenping Wang

Email: wenpingw@alumni.cmu.edu

Citation: Longxiang Zhang, et al. Sliding-BERT: Striding Towards Conversational Machine Comprehension in Long Context. Advances in Artificial Intelligence and Machine Learning. 2023;3 (3):78.


Abstract

    

Pre-trained contextual embeddings like BERT have shown substantial improvement across a wide range of natural language processing tasks.

% Advent of BERT revolutionized the domain of NLP by beating state-of-the-art results in various tasks.

We proposed Sliding-BERT, which incorporates BERT with state-of-the-art conversational machine comprehension (MC) model, \textsc{FlowQA}, and supersedes its standing performance on the \daffy challenge. We designed a striding filter to overcome the sequence length limit of BERT model in the long conversation context. We also applied various aggregation methods to handle the incompatible tokenization between BERT and \textsc{FlowQA} models. Given the long conversation context, we used gradient accumulation to simulate batched training scenarios without extra memory cost during training. We also found that pretraining our Sliding-BERT on CoQA dataset helps improve its performance on \daffy dataset. Detailed analysis of the model performance considering the types of questions, lengths of questions and other metrics of QA datasets indicates that our Sliding-BERT exceeds \textsc{FlowQA} model in terms of F1, HEQ-Q, and HEQ-D scores by a significant margin.

Statistics

   Article View: 1623
   PDF Downloaded: 16