Jaekeol Choi
Adv. Artif. Intell. Mach. Learn., 4 (3):2687-2702
Jaekeol Choi : Hankuk University of Foreign Studies
DOI: https://dx.doi.org/10.54364/AAIML.2024.43156
Article History: Received on: 13-Jul-24, Accepted on: 21-Sep-24, Published on: 28-Sep-24
Corresponding Author: Jaekeol Choi
Email: jaekeol.choi@hufs.ac.kr
Citation: Jaekeol Choi (2024). Binary or Graded, Few-Shot or Zero-Shot: Prompt Design for GPTs in Relevance Evaluation. Adv. Artif. Intell. Mach. Learn., 4 (3 ):2687-2702.
Evaluating the relevance between a query and a passage is a pivotal task in Information Retrieval~(IR). Utilizing such relevance evaluations can assist in ranking as well as in the creation of datasets for training and testing. The recent advancements in Large Language Models~(LLMs) like GPT-4 have contributed to performance enhancements across many natural language processing tasks. Specifically, in the IR domain, many studies are being conducted on tasks related to relevance judgment, showing notable improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. Despite this significance, there is a lack of research on prompts specifically tailored for relevance evaluation. The proposed prompts for this evaluation can be categorized based on how they distinguish relevance (binary or graded) and their reliance on in-context examples (few-shot or zero-shot). In this study, we experimentally investigate these two dimensions to determine which configurations are most advantageous for relevance evaluation. Our findings, based on the GPT-4 model, demonstrate that graded prompts in a zero-shot format are more effective.