What do humans and fruit flies and yeast have in common? A, T, G, and C — the four chemical components of DNA: adenine, thymine, guanine, and cytosine.
However, the way you spell human, or yeast, or soybean, or pig using four letters depends on the billions of ways you combine the letters in strings of instructions millions of units in length. No wonder it's called DNA "code."
Algorithms are being used to crack the code to discover what combinations of genes issue what instructions. An algorithm is a step-by-step procedure for solving a problem, whether balancing your checkbook or looking for repeating sequences in plant or animal genetic code.
How are algorithms used in bioinformatics? DNA relays its instructions for creation of an organism by the order or sequence of the four chemicals, called bases, along a sugar-phosphate backbone. A gene is a specific instruction or set of bases, but is a part of a chromosome, which can contain 150 million bases.
Thus, to locate genes it is necessary to know what you are looking for and then scan a sequence that is more than a million bases long looking for sub-sequences that are repeated. Much like the spell-check function on your computer, a very clever algorithm will find, within the chromosome, sequences that match the target and sequences that are close, since it is possible to have an error in the code you are trying to match.
– Layne T. Watson
Professor of Computer Science and Mathematics
