Neil Johnson, Dylan Restrepo, Frank Huo and Niicholas Restrepo
Adv. Artif. Intell. Mach. Learn., 5 (4):4518-4531
1. Neil Johnson: George Washington University
2. Dylan Restrepo: Cornell University
3. Frank Huo: George Washington University
4. Niicholas Restrepo: George Washington University
DOI: 10.54364/AAIML.2025.54251
Article History: Received on: 24-Aug-25, Accepted on: 04-Nov-25, Published on: 11-Nov-25
Corresponding Author: Neil Johnson
Email: neilfjohnson@me.com
Citation: Neil F. Johnson, et al. Basic Attention Head as a Building Block toward Understanding Transformer-based Generative AI. Advances in Artificial Intelligence and Machine Learning. 2025;5(4):251. https://dx.doi.org/10.54364/AAIML.2025.54251
The question of why ChatGPT-like generative AI works so well -- and when it won't -- has no clear answer as yet. The field of mechanistic interpretability has identified mesoscale circuits and head roles in simpler GPT versions. However, there is a lack of bottom-up insight starting from the microscale of the GPT's 'atom': the Attention head. This paper starts filling this gap, by focusing on the dynamical behavior of a very basic Attention head. Operating alone, it shows a tipping point n* in its output -- and we analyze a mathematical formula that predicts n* as a function of the user's prompt and the training embeddings. Though obviously far from any commercial generative AI system, our results show that this purposely over-simplified example nevertheless yields output content dynamics that mimic some large-scale LLM behaviors. We comment on the potential usefulness of our findings in the real-world domain of insurance, as well as the domains of health and judicial systems.