ISSN :2582-9793

Design and Evaluation Methods for LLM-Based Explainable AI (XAI)-Based Human-AI Collaboration Systems

Original Research (Published On: 16-Sep-2025 )
DOI : https://doi.org/10.54364/AAIML.2025.53240

Cheonsu Jeong

Adv. Artif. Intell. Mach. Learn., 5 (3):4308-4341

1. Cheonsu Jeong: SAMSUNG SDS

Download PDF Here Citation Info via Semantic Scholar

DOI: 10.54364/AAIML.2025.53240

Article History: Received on: 08-Jul-25, Accepted on: 09-Sep-25, Published on: 16-Sep-25

Corresponding Author: Cheonsu Jeong

Email: paripal@korea.ac.kr

Citation: Cheonsu Jeong. Design and Evaluation Methods for LLM-Based Explainable AI (XAI)-Based Human-AI Collaboration Systems. Advances in Artificial Intelligence and Machine Learning. 2025;5(3):240.


Abstract

    

This study re-examines the role of Explainable AI (XAI) within human-AI collaborative environments and proposes a design and evaluation framework for a human-AI collaboration system that integrates Large Language Models (LLMs) and state-of-the-art AI agent technology. The proposed methodology, which consists of an AI model, an explanation generation module, and a human-AI interface, enhances the adaptability and reliability of explanations. A key contribution of this research is the introduction of an LLM-XAI collaborative architecture that integrates personalized, adaptive explanations with a feedback-driven improvement mechanism. Notably, the system presents a novel paradigm for explanations that distinguishes it from conventional XAI methods by utilizing Chain-of-Thought reasoning traces, natural language explanations, and a multi-stage verification mechanism provided by Deep Research and LLM-based agents. The system defines core quality metrics such as explainability, transparency, reliability, interactivity, and adaptability, and concurrently develops a multi-dimensional evaluation framework to assess these metrics using both quantitative and qualitative data. This system is structured with a feedback loop that enables continuous learning and improvement while transparently explaining the AI's decision-making process. The quality of explanations is also assessed with quantitative metrics, and the system improves continuously through user feedback. This study also presents quantitative and qualitative evaluation metrics and user research methodologies to validate the system's effectiveness, which is expected to contribute to achieving trust-based human-AI collaboration. Furthermore, to demonstrate its practical applicability, a pilot implementation in a medical diagnosis support scenario is presented, offering an ideal model where humans and AI collaborate complementarily, thereby playing a crucial role in promoting the ethical use and social acceptance of AI systems.

Statistics

   Article View: 1816
   PDF Downloaded: 16