Cheonsu Jeong
Adv. Artif. Intell. Mach. Learn., 5 (3):4308-4341
1. Cheonsu Jeong: SAMSUNG SDS
DOI: 10.54364/AAIML.2025.53240
Article History: Received on: 08-Jul-25, Accepted on: 09-Sep-25, Published on: 16-Sep-25
Corresponding Author: Cheonsu Jeong
Email: paripal@korea.ac.kr
Citation: Cheonsu Jeong. Design and Evaluation Methods for LLM-Based Explainable AI (XAI)-Based Human-AI Collaboration Systems. Advances in Artificial Intelligence and Machine Learning. 2025;5(3):240.
This study re-examines the role of Explainable AI (XAI)
within human-AI collaborative environments and proposes a design and evaluation
framework for a human-AI collaboration system that integrates Large Language
Models (LLMs) and state-of-the-art AI agent technology. The proposed methodology, which consists of an AI
model, an explanation generation module, and a human-AI interface, enhances the
adaptability and reliability of explanations. A key contribution of this
research is the introduction of an LLM-XAI collaborative architecture that
integrates personalized, adaptive explanations with a feedback-driven
improvement mechanism. Notably, the system presents a novel paradigm for
explanations that distinguishes it from conventional XAI methods by utilizing Chain-of-Thought
reasoning traces, natural language explanations, and a multi-stage verification
mechanism provided by Deep Research and LLM-based agents. The system defines
core quality metrics such as explainability, transparency, reliability,
interactivity, and adaptability, and concurrently develops a multi-dimensional
evaluation framework to assess these metrics using both quantitative and
qualitative data. This system is structured with a feedback loop that enables
continuous learning and improvement while transparently explaining the AI's
decision-making process. The quality of explanations is also assessed with
quantitative metrics, and the system improves continuously through user
feedback. This study also
presents quantitative and qualitative evaluation metrics and user research
methodologies to validate the system's effectiveness, which is expected to
contribute to achieving trust-based human-AI collaboration. Furthermore, to
demonstrate its practical applicability, a pilot implementation in a medical
diagnosis support scenario is presented, offering an ideal model where humans
and AI collaborate complementarily, thereby playing a crucial role in promoting
the ethical use and social acceptance of AI systems.