A Computational Analysis of AI Models in Recognizing Formal Languages

Saadat Hussain; Arapna Bai; Pooja Kumari; Karan Kumar; Aliza Salman; Khalid Rasheed; Syed Samar Yazdani

doi:10.33411/IJIST/ojs1775

Authors

Saadat Hussain Department of Computer Science, Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology (SZABIST)
Arapna Bai Department of Computer Science, Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology (SZABIST)
Pooja Kumari Department of Computer Science, Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology (SZABIST)
Karan Kumar Department of Computer Science, Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology (SZABIST)
Aliza Salman Department of Computer Science, Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology (SZABIST)
Khalid Rasheed Department of Computer Science, Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology (SZABIST)
Syed Samar Yazdani Department of Computer Science, Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology (SZABIST)

DOI:

https://doi.org/10.33411/IJIST/ojs1775

Keywords:

Large Language Models, Formal Languages, Automata Theory, Transformer Architecture, Hallucination, Deterministic Finite Automata, Chomsky Hierarchy, State Machine Conversion

Abstract

Recent progress in Large Language Models (LLMs) has exhibited exceptional efficacy in natural language generation and downstream tasks. Nonetheless, research has consistently demonstrated that these models encounter difficulties with formal grammatical systems, even at the basic levels of the Chomsky Hierarchy. This work empirically investigates the constraints of modern Large Language Models (LLMs) by focusing on specific operations in regular languages, such as NFA-DFA conversion, DFA minimization, and Mealy-Moore state machine transformations. We extensively tested Google Gemini 2.5 Flash by running 50 controlled tests and comparing its outputs to validated results from a deterministic automata theory tool (control baseline: 100% accuracy across all transition tables). Our findings indicate persistent inaccuracies in transition table formulation, incomplete minimization, and recurrent erroneous state assignments. The model had a 0% success rate on complex Mealy–Moore conversions and showed significant problems when recursive ε-closure computation was required. These patterns substantiate theoretical assertions concerning the computational constraints of attention-based architectures in managing deterministic state transitions.

References

N. G. Satwik Bhattamishra, Kabir Ahuja, “On the Ability and Limitations of Transformers to Recognize Formal Languages,” EMNLP 2020 - 2020 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., 2020, [Online]. Available: https://arxiv.org/abs/2009.11264

William Merrill, “Sequential Neural Networks as Automata,” arXiv:1906.01615, 2019, [Online]. Available: https://arxiv.org/abs/1906.01615

Michael Hahn, “Theoretical Limitations of Self-Attention in Neural Sequence Models,” Trans. Assoc. Comput. Linguist., 2019, [Online]. Available: https://arxiv.org/abs/1906.06755

Lena Strobl, William Merrill, Gail Weiss, David Chiang, Dana Angluin, “What Formal Languages Can Transformers Express? A Survey,” Trans. Assoc. Comput. Linguist., 2024, [Online]. Available: https://arxiv.org/abs/2311.00208

Emily Ross, Yuval Kansal, Jake Renzella, Alexandra Vassar, Andrew Taylor, “Supervised Fine-Tuning LLMs to Behave as Pedagogical Agents in Programming Education,” arXiv:2502.20527, 2025, [Online]. Available: https://arxiv.org/abs/2502.20527

Yuhuai Wu, Albert Q. Jiang, Wenda Li, Markus N. Rabe, Charles Staats, Mateja Jamnik, Christian Szegedy, “Autoformalization with Large Language Models,” arXiv:2205.12615, 2022, [Online]. Available: https://arxiv.org/abs/2205.12615

Minghai Lu, Benjamin Delaware, “Proof Automation with Large Language Models,” Proc. - 2024 39th ACM/IEEE Int. Conf. Autom. Softw. Eng. ASE 2024, 2024, [Online]. Available: https://dl.acm.org/doi/10.1145/3691620.3695521

L. G. Jin Jiang, Jianing Wang, Yuchen Yan, Yang Liu, Jianhua Zhu, Mengdi Zhang, Xunliang Cai, “Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?,” arXiv:2505.16998, 2025, [Online]. Available: https://arxiv.org/abs/2505.16998

Y. C. Bill Yuchen Lin, Ronan Le Bras, Kyle Richardson, Ashish Sabharwal, Radha Poovendran, Peter Clark, “ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning,” Proc. Mach. Learn. Res., 2025, [Online]. Available: https://arxiv.org/abs/2502.01100

P. Hooimeijer and M. Veanes, “An Evaluation of Automata Algorithms for String Analysis,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 6538 LNCS, pp. 248–262, 2011, doi: 10.1007/978-3-642-18275-4_18.

M. J.-D. Vikas Raunak, Arul Menezes, “The Curious Case of Hallucinations in Neural Machine Translation,” NAACL-HLT 2021 - 2021 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Conf., 2021, [Online]. Available: https://aclanthology.org/2021.naacl-main.92/

Grégoire Delétang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li Kevin Wenliang, Elliot Catt, Chris Cundy, Marcus Hutter, Shane Legg, Joel Veness, Pedro A. Ortega, “Neural Networks and the Chomsky Hierarchy,” arXiv:2207.02098, 2023, [Online]. Available: https://arxiv.org/abs/2207.02098

Yiding Hao, Dana Angluin, Robert Frank, “Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity,” arXiv:2204.06618, 2022, [Online]. Available: https://arxiv.org/abs/2204.06618

Debela Gemechu, Ramon Ruiz-Dolz, Henrike Beyer, Chris Reed, “Natural Language Reasoning in Large Language Models: Analysis and Evaluation,” ACL Anthol., 2025, [Online]. Available: https://aclanthology.org/2025.findings-acl.192/

Jinxin Liu, Shulin Cao, Jiaxin Shi, Tingjian Zhang, Lunyiu Nie, Linmei Hu, Lei Hou, Juanzi Li, “How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering,” arXiv:2401.05777, 2024, [Online]. Available: https://arxiv.org/abs/2401.05777

E. Y. William Merrill, Gail Weiss, Yoav Goldberg, Roy Schwartz, Noah A. Smith, “A Formal Hierarchy of RNN Architectures,” Proc. Annu. Meet. Assoc. Comput. Linguist., 2020, [Online]. Available: https://aclanthology.org/2020.acl-main.43/

Nadav Borenstein, Anej Svete, Robin Chan, Josef Valvoda, Franz Nowak, Isabelle Augenstein, Eleanor Chodroff, Ryan Cotterell, “What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages,” arXiv:2406.04289, 2025, [Online]. Available: https://arxiv.org/abs/2406.04289

Josef Valvoda, Naomi Saphra, Jonathan Rawski, Adina Williams, Ryan Cotterell, “Benchmarking Compositionality with Formal Languages,” ACL Anthol., 2022, [Online]. Available: https://aclanthology.org/2022.coling-1.525/

Ta-Chung Chi, Ting-Han Fan, Alexander Rudnicky, Peter Ramadge, “Transformer Working Memory Enables Regular Language Reasoning And Natural Language Length Extrapolation,” ACL Anthol., 2023, [Online]. Available: https://aclanthology.org/2023.findings-emnlp.397/

Michael Hahn, Mark Rofin, “Why are Sensitive Functions Hard for Transformers?,” arXiv:2402.09963, 2024, [Online]. Available: https://arxiv.org/abs/2402.09963

William Merrill, Ashish Sabharwal, “The Expressive Power of Transformers with Chain of Thought,” arXiv:2310.07923, 2024, [Online]. Available: https://arxiv.org/abs/2310.07923

Zhiyong Han, Fortunato Battaglia, “Beyond Text Generation: Assessing Large Language Models’ Ability to Reason Logically and Follow Strict Rules,” AI, vol. 6, no. 1, p. 12, 2025, [Online]. Available: https://www.mdpi.com/2673-2688/6/1/12

Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn & Yarin Gal, “Detecting hallucinations in large language models using semantic entropy,” Nature, vol. 630, 2024, [Online]. Available: https://www.nature.com/articles/s41586-024-07421-0

S. S. Debargha Ganguly, Vikash Singh, “Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks,” arXiv:2505.20047, 2025, [Online]. Available: https://arxiv.org/abs/2505.20047

N. T. Mohamed Amine Ferrag, “Reasoning Beyond Limits: Advances and Open Problems for LLMs,” ICT Express, 2025, [Online]. Available: https://arxiv.org/abs/2503.22732

G. K. Alexandra Butoi, “Training Neural Networks as Recognizers of Formal Languages,” 13th Int. Conf. Learn. Represent. ICLR, 2024, [Online]. Available: https://arxiv.org/abs/2411.07107

A. W. L. Pascal Bergsträßer, Ryan Cotterell, “Transformers are Inherently Succinct,” arXiv:2510.19315, 2025, [Online]. Available: https://arxiv.org/abs/2510.19315

Shlok Shelat, Jay Raval, Souvik Roy, Manas Gaur, “Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks,” arXiv:2601.13392, 2026, [Online]. Available: https://arxiv.org/abs/2601.13392

S. Sasaki, D. M. Lopez, and T. T. Johnson, “Neurosymbolic Finite and Pushdown Automata: Improved Multimodal Reasoning versus Vision Language Models (VLMs),” NeuS, 2025.

Kechi Zhang, Ge Li, Jia Li, Huangzhao Zhang, Yihong Dong, Jia Li, Jingjing Xu, Zhi Jin, “Recursive Transformer: Boosting Reasoning Ability with State Stack,” Open Rev., 2026, [Online]. Available: https://openreview.net/forum?id=2bbDg587uh