ISSN :2582-9793

Going Beyond a Basic Attention Head toward an Understanding of Transformer-based Generative AI

Original Research (Published On: 14-Dec-2025 )
DOI : https://doi.org/10.54364/AAIML.2025.54259

Neil Johnson, Nicholas Restrepo, Frank Huo and Dylan Restrepo

Adv. Artif. Intell. Mach. Learn., 5 (4):4675-4691

1. Neil Johnson: George Washington University

2. Nicholas Restrepo: George Washington University

3. Frank Huo: George Washington University

4. Dylan Restrepo: Cornell University

Download PDF Here

DOI: 10.54364/AAIML.2025.54259

Article History: Received on: 24-Aug-25, Accepted on: 07-Dec-25, Published on: 14-Dec-25

Corresponding Author: Neil Johnson

Email: neilfjohnson@me.com

Citation: Nicholas J. Restrepo, et al. Going Beyond a Basic Attention Head toward an Understanding of Transformerbased Generative AI. Advances in Artificial Intelligence and Machine Learning. 2025. (Ahead of print). https://dx.doi.org/10.54364/AAIML.2025.54259


Abstract

    

This paper goes beyond the basic Attention head analysis introduced in our accompanying paper, as part of our long-term goal to establish a bottom-up understanding of Chat-GPT-like generative AI. We provide evidence that suggests the output tipping behavior that we reported for the basic Attention head can persist despite some of the complications of real LLMs. Specifically, we consider here (1) a richer vocabulary, (2) non-identity matrices learned during pre-training, (3) non-zero temperature during next-token selection, (4) more than one layer of Attention heads. We then offer some preliminary evidence that the insights gained from this bottom-up approach, can help improve performance in real-world generative AI systems.


Statistics

   Article View: 1158
   PDF Downloaded: 10