Transformers as the Foundation of Large Language Models: A Comprehensive Review

Muhammad Rayan Shaikh; Nida Shahryar; Khalid Mahboob; Qurrat-ul-Ain Naiyar; Muhammad Talha

Authors

Muhammad Rayan Shaikh Karachi Institute of Economics and Technology (KIET)
Nida Shahryar Indus University Karachi
Khalid Mahboob Institute of Business Management Karachi (IoBM)
Qurrat-ul-Ain Naiyar Institute of Business Management Karachi (IoBM)
Muhammad Talha Karachi Institute of Economics and Technology (KIET)

Keywords:

Transformer Architecture, Natural Language Processing (NLP), Sequence-to-Sequence, LLMs Large Language Models, BERT, GPT, Foundation Models

Abstract

The transformation of Transformer architecture has led the way into a new era for NLP, as it broke the traditional RNNs, LSTMs, Seq2Seq models, etc. As their main feature, the Revolution of Transformers was the hybridization of self-attention and multiheaded attention, which allowed the models to learn dependencies across time spans of any length through positioning methods. This resulted in a quick and efficient process for training large-scale Language Models (LLMs) that could handle the data very well and simultaneously learn the long-term dependencies. This paper is titled "Transformers as the Foundation of Large Language Models: A Comprehensive Review", and it not only reflects but also presents a critically reviewed path taken by LLMs from BERT to GPT-4 and beyond, along with the better reasoning, arithmetic, and instruction following attributed to the scaling up of architecture. The review further indicates and discusses the current concerns regarding efficiency, bias, interpretability, and domain specialization, and warns that settling these issues might dictate the fate of T-bases improvements. The authors aim through this project to provide an exhaustive comprehension of the setting in which Transformers enabled LLMs and actively directed the development of contemporary AI research.

References

Z. Wang, Z. Chu, T. V. Doan, S. Ni, M. Yang, and W. Zhang, “History, development, and principles of large language models: an introductory survey,” AI Ethics 2024 53, vol. 5, no. 3, pp. 1955–1971, Oct. 2024, doi: 10.1007/S43681-024-00583-7.

R. J. Zishan Guo, “Evaluating Large Language Models: A Comprehensive Survey,” Int. J. Latest Eng. Manag. Res., vol. 9, no. 10, pp. 5–16, 2023, doi: https://doi.org/10.48550/arXiv.2310.19736.

M. A. K. Raiaan, “A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges,” IEEE Access, vol. 12, pp. 26839–26874, 2024, doi: 10.1109/ACCESS.2024.3365742.

Pranjal Kumar, “Large language models (LLMs): survey, technical frameworks, and future challenges,” Artif. Intell. Rev., vol. 57, no. 260, 2024, doi: https://doi.org/10.1007/s10462-024-10888-y.

J. W. Long Ouyang, “Training language models to follow instructions with human feedback,” Adv. Neural Inf. Process. Syst., 2022, doi: https://doi.org/10.48550/arXiv.2203.02155.

G. Bharathi Mohan et al., “An analysis of large language models: their impact and potential applications,” Knowl. Inf. Syst. 2024 669, vol. 66, no. 9, pp. 5047–5070, May 2024, doi: 10.1007/S10115-024-02120-8.

H. Naveed et al., “A Comprehensive Overview of Large Language Models,” Jul. 2023, Accessed: Jun. 05, 2025. [Online]. Available: https://arxiv.org/pdf/2307.06435v9

S. M. Narendra Patwardhan, “Transformers in the Real World: A Survey on NLP Applications,” Information, vol. 14, no. 4, p. 242, 2023, doi: https://doi.org/10.3390/info14040242.

Y. Annepaka and P. Pakray, “Large language models: a survey of their development, capabilities, and applications,” Knowl. Inf. Syst. 2024 673, vol. 67, no. 3, pp. 2967–3022, Dec. 2024, doi: 10.1007/S10115-024-02310-4.

G. Z. Ian A. Scott, “The new paradigm in machine learning - foundation models, large language models and beyond: a primer for physicians,” Intern. Med. J., vol. 54, no. 5, pp. 705–715, 2024, doi: 10.1111/imj.16393. Epub 2024 May 7.

U. Kamath, K. Keenan, G. Somers, and S. Sorenson, “Large Language Models: A Deep Dive: Bridging Theory and Practice,” Large Lang. Model. A Deep Dive Bridg. Theory Pract., pp. 1–472, Jan. 2024, doi: 10.1007/978-3-031-65647-7/COVER.

B. C. K. Seyed Mahmoud Sajjadi Mohammadabadi, “A Survey of Large Language Models: Evolution, Architectures, Adaptation, Benchmarking, Applications, Challenges, and Societal Implications,” Electronics, vol. 14, no. 18, p. 3580, 2025, doi: https://doi.org/10.3390/electronics14183580.

P. Illangarathne, N. Jayasinghe, and A. B. D. De Lima, “A Comprehensive Review of Transformer-Based Models: ChatGPT and Bard in Focus,” 2024 7th Int. Conf. Artif. Intell. Big Data, ICAIBD 2024, pp. 543–554, 2024, doi: 10.1109/ICAIBD62003.2024.10604437.

C. Wang, M. Li, and A. J. Smola, “Language Models with Transformers,” arXiv Prepr. arXiv1904.09408, 2019, Accessed: Nov. 01, 2025. [Online]. Available: https://github.com/cgraywang/

Q. L. Ce Zhou, “A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT,” Int. J. Mach. Learn. Cybern., 2023, doi: https://doi.org/10.48550/arXiv.2302.09419.

G. Bansal, V. Chamola, A. Hussain, M. Guizani, and D. Niyato, “Transforming Conversations with AI—A Comprehensive Study of ChatGPT,” Cogn. Comput. 2024 165, vol. 16, no. 5, pp. 2487–2510, Jan. 2024, doi: 10.1007/S12559-023-10236-2.

D. Myers et al., “Foundation and large language models: fundamentals, challenges, opportunities, and social impacts,” Clust. Comput. 2023 271, vol. 27, no. 1, pp. 1–26, Nov. 2023, doi: 10.1007/S10586-023-04203-7.

S. Minaee et al., “Large Language Models: A Survey,” Feb. 2024, Accessed: Apr. 20, 2025. [Online]. Available: https://arxiv.org/abs/2402.06196v3

G. L. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, “LLaMA: Open and Efficient Foundation Language Models,” arXiv:2302.13971, 2023, doi: https://doi.org/10.48550/arXiv.2302.13971.

X. W. Yupeng Chang, “A Survey on Evaluation of Large Language Models,” ACM Trans. Intell. Syst. Technol., vol. 15, no. 3, pp. 1–45, 2024, doi: https://doi.org/10.1145/3641289.

S. Suthar, K. Kanani, and B. Dalwadi, “Evolution and Advancements in ChatGPT: A Comprehensive Survey,” Smart Innov. Syst. Technol., vol. 430 SIST, pp. 131–142, 2025, doi: 10.1007/978-981-96-1206-2_12.

Z. Lai, X. Zhang, and S. Chen, “Adaptive Ensembles of Fine-Tuned Transformers for LLM-Generated Text Detection,” Proc. Int. Jt. Conf. Neural Networks, 2024, doi: 10.1109/IJCNN60899.2024.10651296.

Z. L. Yuhong Mo, Hao Qin, Yushan Dong, Ziyi Zhu, “Large Language Model (LLM) AI text generation detection based on transformer deep learning algorithm,” arXiv:2405.06652, 2024, doi: https://doi.org/10.48550/arXiv.2405.06652.

I. Tasou, P. Anastasiadis, P. Mpakos, D. Galanopoulos, N. Koziris, and G. Goumas, “Breaking Down LLM Inference: A preliminary performance analysis of sparsified transformers,” Proc. - 2025 IEEE Int. Parallel Distrib. Process. Symp. Work. IPDPSW 2025, pp. 991–995, 2025, doi: 10.1109/IPDPSW66978.2025.00154.