Reference Hub5
An Approach to DNA Sequence Classification Through Machine Learning: DNA Sequencing, K Mer Counting, Thresholding, Sequence Analysis

An Approach to DNA Sequence Classification Through Machine Learning: DNA Sequencing, K Mer Counting, Thresholding, Sequence Analysis

Sapna Juneja, Annu Dhankhar, Abhinav Juneja, Shivani Bali
Copyright: © 2022 |Volume: 11 |Issue: 2 |Pages: 15
ISSN: 2160-9551|EISSN: 2160-956X|EISBN13: 9781683182580|DOI: 10.4018/IJRQEH.299963
Cite Article Cite Article

MLA

Juneja, Sapna, et al. "An Approach to DNA Sequence Classification Through Machine Learning: DNA Sequencing, K Mer Counting, Thresholding, Sequence Analysis." IJRQEH vol.11, no.2 2022: pp.1-15. http://doi.org/10.4018/IJRQEH.299963

APA

Juneja, S., Dhankhar, A., Juneja, A., & Bali, S. (2022). An Approach to DNA Sequence Classification Through Machine Learning: DNA Sequencing, K Mer Counting, Thresholding, Sequence Analysis. International Journal of Reliable and Quality E-Healthcare (IJRQEH), 11(2), 1-15. http://doi.org/10.4018/IJRQEH.299963

Chicago

Juneja, Sapna, et al. "An Approach to DNA Sequence Classification Through Machine Learning: DNA Sequencing, K Mer Counting, Thresholding, Sequence Analysis," International Journal of Reliable and Quality E-Healthcare (IJRQEH) 11, no.2: 1-15. http://doi.org/10.4018/IJRQEH.299963

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Machine learning (ML) has been instrumental in optimal decision making through relevant historical data, including the domain of bioinformatics. In bioinformatics classification of natural genes and the genes that are infected by disease called invalid gene is a very complex task. In order to find the applicability of a fresh protein through genomic research, DNA sequences need to be classified. The current work identifies classes of DNA sequence using machine learning algorithm. These classes are basically dependent on the sequence of nucleotides. With a fractional mutation in sequence, there is a corresponding change in the class. Each numeric instance representing a class is linked to a gene family including G protein coupled receptors, tyrosine kinase, synthase, etc. In this paper, the authors applied the classification algorithm on three types of datasets to identify which gene class they belong to. They converted sequences into substrings with a defined length. That ‘k value' defines the length of substring which is one of the ways to analyze the sequence.