Cluster Analysis of COVID-19 Through Genome Sequences Using Python Bioinformatics Library

Authors

  • Maryam Ghauri Department of Software Engineering, Mehran University of Engineering & Technology Jamshoro, Pakistan.
  • Naeem A. Mahoto Department of Software Engineering, Mehran University of Engineering & Technology Jamshoro, Pakistan.
  • Sania Bhatti Department of Software Engineering, Mehran University of Engineering & Technology Jamshoro, Pakistan.
  • Aqsa Umar Department of Software Engineering, Mehran University of Engineering & Technology Jamshoro, Pakistan.

Keywords:

Covid-19, Genome sequences, nucleotide records, biomedical, python

Abstract

Introduction and Importance of Study:

During the COVID-19 pandemic, mortality rates varied across different regions of the world. To better understand the virus's behavior, it's important to gain in-depth knowledge of the nucleotide records of the COVID-19 genomic sequence.

Novelty Statement:

In the study, researchers analyzed clusters of highly affected countries to find similar codons in countries with a similar effect of the virus through Python based library.

Material And Method:

Nucleotide records were extracted from the NCBI database in FASTA format. Further Python Bioinformatics library was used to form the clusters of each country using the K-means clustering technique.

Result and Discussion:

The study focuses on finding the similarities between the codons of amino acids in different countries that are affected in a similar way during COVID-19. For instance, China and the EU have a lower mortality rate and have Leucine, Methionine, Isoleucine, and Valine amino acids in common. On the other hand, countries like Pakistan and India have Leucine, Isoleucine, Valine, and Threonine in common and an average death rate. Moreover, Brazil and the US have a higher mortality rate and share similar codons such as Leucine, Glutamine, and Amber.

Concluding Remarks:

The study shows that countries affected by COVID-19 in a similar way share some common amino acids and their respective codons.

Author Biographies

Naeem A. Mahoto, Department of Software Engineering, Mehran University of Engineering & Technology Jamshoro, Pakistan.

Naeem Ahmed Mahoto received Master degree in Computer Engineering from Mehran University of Engineering and Technology, Pakistan and Ph.D in Information and System Engineering from Politecnico di Torino, Italy in 2013. He is currently working as Associate Professor/Chairman of Department of Software Engineering, Mehran UET Pakistan. His research interests are focused in the field of data science and bioinformatics. His research activities are also devoted to concept extraction, data visualization, Hybrid classification and educational data mining. He is member of IEEE Computer society and Pakistan Engineering Council (PEC).

Sania Bhatti, Department of Software Engineering, Mehran University of Engineering & Technology Jamshoro, Pakistan.

Dr. Sania Bhatti is working with the Department of Software Engineering, Mehran UET, Jamshoro, Sindh, Pakistan. She completed her PhD degree from the University of Leeds, United Kingdom in 2010 under the scholarship of faculty development program. Her research interests include modelling and simulation, communication networks and software engineering. She has published around twenty journal papers and various International conference papers.

References

D. Cucinotta and M. Vanelli, “WHO Declares COVID-19 a Pandemic,” Acta Biomed., vol. 91, no. 1, pp. 157–160, 2020, doi: 10.23750/ABM.V91I1.9397.

A. N. Sajed and K. Amgain, “Coronavirus Disease (COVID-19) Outbreak and the Strategy for Prevention,” Eur. J. Med. Sci. , vol. 2, no. 1, pp. 1–3, Mar. 2020, doi: 10.46405/EJMS.V2I1.38.

F. Wu et al., “A new coronavirus associated with human respiratory disease in China,” Nat. 2020 5797798, vol. 579, no. 7798, pp. 265–269, Feb. 2020, doi: 10.1038/s41586-020-2008-3.

“COVID-19 cases | WHO COVID-19 dashboard.” Accessed: Mar. 03, 2024. [Online]. Available: https://data.who.int/dashboards/covid19/cases?n=c

S. Raskin, “Genetics of COVID-19,” J. Pediatr. (Rio. J)., vol. 97, no. 4, pp. 378–386, Jul. 2021, doi: 10.1016/J.JPED.2020.09.002.

I. Fricke-Galindo and R. Falfán-Valencia, “Genetics Insight for COVID-19 Susceptibility and Severity: A Review,” Front. Immunol., vol. 12, p. 622176, Apr. 2021, doi: 10.3389/FIMMU.2021.622176/BIBTEX.

P. D. Cristea, “Conversion of nucleotides sequences into genomic signals,” J. Cell. Mol. Med., vol. 6, no. 2, pp. 279–303, Apr. 2002, doi: 10.1111/J.1582-4934.2002.TB00196.X.

F. Castro-Chavez, “Most Used Codons per Amino Acid and per Genome in the Code of Man Compared to Other Organisms According to the Rotating Circular Genetic Code,” Neuroquantology, vol. 9, no. 4, pp. 747–767, 2011, doi: 10.14704/NQ.2011.9.4.500.

T. F. Clarke IV and P. L. Clark, “Rare Codons Cluster,” PLoS One, vol. 3, no. 10, p. e3412, Oct. 2008, doi: 10.1371/JOURNAL.PONE.0003412.

A. Baranova, H. Cao, S. Teng, K. P. Su, and F. Zhang, “Shared genetics and causal associations between COVID-19 and multiple sclerosis,” J. Med. Virol., vol. 95, no. 1, p. e28431, Jan. 2023, doi: 10.1002/JMV.28431.

K. S. te Paske, C. van Tienen, D. Dunk, D. van Pelt, and P. W. Smit, “SARS-CoV-2 transmission among health care workers, an outbreak investigation using whole-genome sequencing,” PLoS One, vol. 18, no. 3, p. e0283292, Mar. 2023, doi: 10.1371/JOURNAL.PONE.0283292.

H. C. Maltezou et al., “SARS-CoV-2 Infection in Healthcare Personnel With High-risk Occupational Exposure: Evaluation of 7-Day Exclusion From Work Policy,” Clin. Infect. Dis., vol. 71, no. 12, pp. 3182–3187, Dec. 2020, doi: 10.1093/CID/CIAA888.

J. Kim, S. Cheon, and I. Ahn, “NGS data vectorization, clustering, and finding key codons in SARS-CoV-2 variations,” BMC Bioinformatics, vol. 23, no. 1, pp. 1–24, Dec. 2022, doi: 10.1186/S12859-022-04718-7/TABLES/4.

B. Korber et al., “Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus,” Cell, vol. 182, no. 4, pp. 812-827.e19, Aug. 2020, doi: 10.1016/J.CELL.2020.06.043.

L. Zhang et al., “SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity,” Nat. Commun. 2020 111, vol. 11, no. 1, pp. 1–9, Nov. 2020, doi: 10.1038/s41467-020-19808-4.

L. Guruprasad, “Human SARS CoV-2 spike protein mutations,” Proteins Struct. Funct. Bioinforma., vol. 89, no. 5, pp. 569–576, May 2021, doi: 10.1002/PROT.26042.

D. Mercatelli and F. M. Giorgi, “Geographic and Genomic Distribution of SARS-CoV-2 Mutations,” Front. Microbiol., vol. 11, p. 555497, Jul. 2020, doi: 10.3389/FMICB.2020.01800/BIBTEX.

A. Umar, N. A. Mahoto, S. Bhatti, and S. Rathi, “Analysis of Covid-19 Genome Sequences based on Geo-Locations,” Pakistan J. Eng. Technol., vol. 4, no. 4, pp. 41–45, Dec. 2021, doi: 10.51846/VOL4ISS4PP41-45.

M. R. Islam et al., “Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity,” Sci. Reports 2020 101, vol. 10, no. 1, pp. 1–9, Aug. 2020, doi: 10.1038/s41598-020-70812-6.

N. Marascio et al., “Molecular Characterization and Cluster Analysis of SARS-CoV-2 Viral Isolates in Kahramanmaraş City, Turkey: The Delta VOC Wave within One Month,” Viruses 2023, Vol. 15, Page 802, vol. 15, no. 3, p. 802, Mar. 2023, doi: 10.3390/V15030802.

D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler, “GenBank,” Nucleic Acids Res., vol. 36, no. suppl_1, pp. D25–D30, Jan. 2008, doi: 10.1093/NAR/GKM929.

J. A. Botía et al., “An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks,” BMC Syst. Biol., vol. 11, no. 1, pp. 1–16, Apr. 2017, doi: 10.1186/S12918-017-0420-6/FIGURES/7.

H. Z. Girgis, “MeShClust v3.0: high-quality clustering of DNA sequences using the mean shift algorithm and alignment-free identity scores,” BMC Genomics, vol. 23, no. 1, pp. 1–16, Dec. 2022, doi: 10.1186/S12864-022-08619-0/FIGURES/3.

A. Melnyk et al., “From Alpha to Zeta: Identifying variants and subtypes of SARS-CoV-2 via clustering,” bioRxiv, p. 2021.08.26.457874, Aug. 2021, doi: 10.1101/2021.08.26.457874.

N. Shi, X. Liu, and Y. Guan, “Research on k-means clustering algorithm: An improved k-means clustering algorithm,” 3rd Int. Symp. Intell. Inf. Technol. Secur. Informatics, IITSI 2010, pp. 63–67, 2010, doi: 10.1109/IITSI.2010.74.

“Biopython · Biopython.” Accessed: Mar. 03, 2024. [Online]. Available: https://biopython.org/

Downloads

Published

2024-03-14

How to Cite

Ghauri, M., Mahoto, N. A., Bhatti, S., & Umar, A. (2024). Cluster Analysis of COVID-19 Through Genome Sequences Using Python Bioinformatics Library. International Journal of Innovations in Science & Technology, 6(1), 266–275. Retrieved from https://journal.50sea.com/index.php/IJIST/article/view/688

Issue

Section

Articles