Artificial Intelligence Meets Endocrinology: A Machine Learning-Based Approach to Thyroid Disease Diagnosis Using Feature Selection Methods

Aftab Ahamd Khan; Bakhtiar Khan; Muhammad Arif; Waseel ud Din; Wahab Khan; Yasir Tayyab Khayyam; Ashraf Ullah; Kalim Ullah

Authors

Aftab Ahamd Khan Department of Computer Science, University of Science & Technology Bannu, Pakistan
Bakhtiar Khan Department of Computer Science, University of Science & Technology Bannu, Pakistan
Muhammad Arif Department of Computer Science, University of Science & Technology Bannu, Pakistan
Waseel ud Din Department of Computer Science, Birmingham City University, United Kingdom
Wahab Khan Department of Computer Science, University of Science & Technology, Bannu, Pakistan
Yasir Tayyab Khayyam Gomal Research Institute of Computing (GRIC), Faculty of Computing, Gomal University, D.I. Khan, K.P.K, Pakistan
Ashraf Ullah Department of Computer Science, University of Science & Technology Bannu, Pakistan
Kalim Ullah Department of Electrical Engineering, University of Science & Technology Bannu, Pakistan

Keywords:

Thyroid Disease,, Hormones,, Feature Selection,, Linear Discriminant Analysis,, Chi-Square,, Recursive Feature Elimination,, Machine Learning Based Diagnostic System

Abstract

Thyroid Disease (TD) arises when the thyroid gland either grows abnormally or does not generate enough thyroid hormones, and might cause serious health issues and consequences. Early and efficient identification of thyroid disease is important for improved clinical intervention and disease management. By combining sophisticated and advanced machine learning models with a range of advanced feature selection strategies, this research study aims to enhance the classification of thyroid disease based on a machine learning based diagnostic system. The preprocessed dataset used in this study and the trials were taken from the machine learning repository at the University of California, Irvine (UCI). We employ two popular feature selection techniques- Chi-Square, and Recursive Feature Elimination, and a dimensionality reduction technique Linear Discriminant Analysis (LDA), and to choose the best features from the dataset for experiments. After selecting the most suitable features, they were then used to train and test the machine learning models: Multi-Layer Perceptron (MLP), Gradient Boost (GB), and Recurrent Neural Network (RNN). Evaluation matrices, accuracy, precision, recall, and F1-score were used to assess models' performance. The experimental results show that the machine learning model Gradient Boost (GB) outperformed the other models and yielded an accuracy of 99%, indicating its ability to classify the Thyroid Disease (TD) accurately. The proposed research work helps to create an intelligent decision-support system for medical diagnostics by offering an understandable and reliable framework for Thyroid Detection.

References

T. Alyas, M. Hamid, K. Alissa, T. Faiz, N. Tabassum, and A. Ahmad, “[Retracted] Empirical Method for Thyroid Disease Classification Using a Machine Learning Approach,” Biomed Res. Int., vol. 2022, no. 1, p. 9809932, Jan. 2022, doi: 10.1155/2022/9809932.

M. P. J. Vanderpump, “The epidemiology of thyroid disease,” Br. Med. Bull., vol. 99, no. 1, pp. 39–51, Sep. 2011, doi: 10.1093/BMB/LDR030.

H. Abbad Ur Rehman, C. Y. Lin, Z. Mushtaq, and S. F. Su, “Performance Analysis of Machine Learning Algorithms for Thyroid Disease,” Arab. J. Sci. Eng., vol. 46, no. 10, pp. 9437–9449, Oct. 2021, doi: 10.1007/S13369-020-05206-X/TABLES/4.

A. Sultana and R. Islam, “Machine learning framework with feature selection approaches for thyroid disease classification and associated risk factors identification,” J. Electr. Syst. Inf. Technol. 2023 101, vol. 10, no. 1, pp. 1–23, Jun. 2023, doi: 10.1186/S43067-023-00101-5.

S. Verma, R. Popli, H. Kumar, and A. Srivastava, “Classification of thyroid diseases using machine learning frameworks,” Int. J. Health Sci. (Qassim)., vol. 6, no. S1, pp. 7552–7566, Apr. 2022, doi: 10.53730/IJHS.V6NS1.6603.

F. Monaco, “Classification of thyroid diseases: suggestions for a revision,” J. Clin. Endocrinol. Metab., vol. 88, no. 4, pp. 1428–1432, Apr. 2003, doi: 10.1210/JC.2002-021260.

X. Zhang, V. C. Lee, J. Rong, J. C. Lee, and F. Liu, “Deep convolutional neural networks in thyroid disease detection: A multi-classification comparison by ultrasonography and computed tomography,” Comput. Methods Programs Biomed., vol. 220, Jun. 2022, doi: 10.1016/J.CMPB.2022.106823.

X. Zhang, V. C. S. Lee, J. Rong, J. C. Lee, J. Song, and F. Liu, “A multi-channel deep convolutional neural network for multi-classifying thyroid diseases,” Comput. Biol. Med., vol. 148, p. 105961, Sep. 2022, doi: 10.1016/J.COMPBIOMED.2022.105961.

L. Aversano, M. L. Bernardi, M. Cimitile, A. Maiellaro, and R. Pecori, “A systematic review on artificial intelligence techniques for detecting thyroid diseases,” PeerJ Comput. Sci., vol. 9, p. e1394, Jun. 2023, doi: 10.7717/PEERJ-CS.1394.

S. Sankar, A. Potti, G. Naga Chandrika, and S. Ramasubbareddy, “Thyroid Disease Prediction Using XGBoost Algorithms,” J. Mob. Multimed., vol. 18, no. 3, pp. 917–934, Feb. 2022, doi: 10.13052/JMM1550-4646.18322.

V. Sureshkumar, S. Balasubramaniam, V. Ravi, and A. Arunachalam, “A hybrid optimization algorithm-based feature selection for thyroid disease classifier with rough type-2 fuzzy support vector machine,” Expert Syst., vol. 39, no. 1, Jan. 2022, doi: 10.1111/EXSY.12811.

G. Obaido et al., “An Improved Framework for Detecting Thyroid Disease Using Filter-Based Feature Selection and Stacking Ensemble,” IEEE Access, vol. 12, pp. 89098–89112, 2024, doi: 10.1109/ACCESS.2024.3418974.

M. A. Hall, “Correlation-based feature selection for machine learning,” 1999. Accessed: Oct. 10, 2025. [Online]. Available: https://hdl.handle.net/10289/15043

A. Sharma and S. Dey, “A comparative study of selection and machine learning techniques for sentiment analysis,” Proceeding 2012 ACM Res. Appl. Comput. Symp. RACS 2012, pp. 1–7, 2012, doi: 10.1145/2401603.2401605.

Y. Zhai, W. Song, X. Liu, L. Liu, and X. Zhao, “A Chi-Square Statistics Based Feature Selection Method in Text Classification,” Proc. IEEE Int. Conf. Softw. Eng. Serv. Sci. ICSESS, vol. 2018-November, pp. 160–163, Jul. 2018, doi: 10.1109/ICSESS.2018.8663882.

V. Rupapara, F. Rustam, A. Ishaq, E. Lee, and I. Ashraf, “Chi-Square and PCA Based Feature Selection for Diabetes Detection with Ensemble Classifier,” Intell. Autom. Soft Comput., vol. 36, no. 2, pp. 1931–1949, Jan. 2023, doi: 10.32604/IASC.2023.028257.

T. Almutiri and F. Saeed, “Chi Square and Support Vector Machine with Recursive Feature Elimination for Gene Expression Data Classification,” 2019 1st Int. Conf. Intell. Comput. Eng. Towar. Intell. Solut. Dev. Empower. our Soc. ICOICE 2019, Dec. 2019, doi: 10.1109/ICOICE48418.2019.9035165.

M. Awad and S. Fraihat, “Recursive Feature Elimination with Cross-Validation with Decision Tree: Feature Selection Method for Machine Learning-Based Intrusion Detection Systems,” J. Sens. Actuator Networks 2023, Vol. 12, Page 67, vol. 12, no. 5, p. 67, Sep. 2023, doi: 10.3390/JSAN12050067.

X. -w. C. and J. C. Jeong, “Enhanced recursive feature elimination,” Sixth Int. Conf. Mach. Learn. Appl. (ICMLA 2007), Cincinnati, OH, USA, pp. 429–435, 2007, doi: 10.1109/ICMLA.2007.35.

I. D. Mienye, T. G. Swart, and G. Obaido, “Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications,” Inf. 2024, Vol. 15, Page 517, vol. 15, no. 9, p. 517, Aug. 2024, doi: 10.3390/INFO15090517.

H. Salehinejad, S. Sankar, J. Barfett, E. Colak, and S. Valaee, “Recent Advances in Recurrent Neural Networks,” Dec. 2017, Accessed: Oct. 10, 2025. [Online]. Available: https://arxiv.org/pdf/1801.01078

E. Alonso, B. Moysset, and R. Messina, “Adversarial generation of handwritten text images conditioned on sequences,” Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, pp. 481–486, Sep. 2019, doi: 10.1109/ICDAR.2019.00083.

A. Beygelzimer, E. Hazan, S. Kale, and H. Luo, “Online Gradient Boosting,” Adv. Neural Inf. Process. Syst., vol. 28, 2015.

C. Bentéjac, A. Csörgő, and G. Martínez-Muñoz, “A comparative analysis of gradient boosting algorithms,” Artif. Intell. Rev., vol. 54, no. 3, pp. 1937–1967, Mar. 2021, doi: 10.1007/S10462-020-09896-5/TABLES/12.

D. A. Otchere, T. O. A. Ganat, J. O. Ojero, B. N. Tackie-Otoo, and M. Y. Taki, “Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions,” J. Pet. Sci. Eng., vol. 208, p. 109244, Jan. 2022, doi: 10.1016/J.PETROL.2021.109244.

A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Front. Neurorobot., vol. 7, no. DEC, p. 63623, Dec. 2013, doi: 10.3389/FNBOT.2013.00021/BIBTEX.

G. Ke et al., “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” Adv. Neural Inf. Process. Syst., vol. 30, 2017, Accessed: Oct. 10, 2025. [Online]. Available: https://github.com/Microsoft/LightGBM.

E. Wilson and D. W. Tufts, “Multilayer perceptron design algorithm,” Neural Networks Signal Process. - Proc. IEEE Work., pp. 61–68, 1994, doi: 10.1109/NNSP.1994.366063.

V. A. Golovko, “Deep learning: an overview and main paradigms,” Opt. Mem. Neural Networks (Information Opt., vol. 26, no. 1, pp. 1–17, Jan. 2017, doi: 10.3103/S1060992X16040081/METRICS.

M. Alnaggar, M. Handosa, T. Medhat, and M. Z. Rashad, “Thyroid Disease Multi-class Classification based on Optimized Gradient Boosting Model,” Egypt. J. Artif. Intell., vol. 2, no. 1, pp. 1–14, Apr. 2023, doi: 10.21608/EJAI.2023.205554.1008.