What is bioinformatics?
Imagine that you are trying to translate a piece of text written in a foreign alphabet, in a language that has not been used for 4,000 years, about a subject that is unknown — and there are no spaces between words, nor punctuation.
Many pieces of epigraphy, such as hieroglyphics, pose this problem. Thomas Young, a British physicist; and Jean Francois Champollion, a French Egyptologist, collaborated to decipher the Rosetta Stone, and Michael Ventris untangled the syllabic script of the Mycenaeans. Biochemistry offers a new challenge for scientists as both code breakers and code makers.
The building blocks of enzymatic and functional structures in living organisms are proteins created by linking amino acids into polypeptides. The genetic code stored in DNA in every cell provides instructions for each individual amino acid, formed by stringing together four different base chemicals, abbreviated AGCT. Nature uses the four bases three at a time to create 64 possible combinations (43). Some of these three letter “words” mean the same amino acid; thus, this language has words for 20 common amino-acid sequences. These polypeptide words can be a few letters long, or thousands of letters long.
Suppose nature wanted a simple nano-peptide (nine letters) to cause the smooth muscle of the uterus to contract — we’ll call it oxytocin (pitocin). How many nine-letter words are there? ~209! How does nature encode what it wants? How is that code translated? How does the word function?
Biochemists are interested in the gene structures that cause this protein to be expressed (the coding), how the hormone works (where it acts, how it activates the muscle), and whether there are other proteins that might be better. Researchers have even developed techniques to synthesize thousands of compounds using robotics (combinatorial chemistry) and high-throughput-screening methods to test synthetic peptides against target functions or diseases.
This is the realm of bioinformatics.
To realize its immensity, compare 1) our own language, with 26 different letters, and our brains that can recognize perhaps a million words that might commonly use five to 15 letters; and 2) the bio-realm that that employs 20 protein letters in words that may involve 5,000 to 10,000 “letters.”
And it gets more complex. The proteins are sometimes modified by attachment of carbohydrates that may be viewed as accent marks, like è, é, ê, ë. Commonly about seven accent marks are used, and up to 20 percent of the amino acid letters may have accent marks.
The push is on to create large libraries of data that contain base sequences, amino-acid sequences, and their glycosylation accent marks for plants, micro-organisms, and animals. Then scientists search the data for concordance, the way that language experts identify authors of texts, such as the Gospels or poems from the Shakespearean time. They data-mine to find similar structures in the bio-texts of plants and animals, so that they can reveal novel aspects of basic biology, inexpensively create new and better drugs, begin to understand how natural chemicals function, and predict what man-made structures might be better, or find how genetically linked diseases start, and how they may be cured.
This is all bioinformatics; a playing field akin to code making and breaking, but on a scale that can only be addressed by the computer power available today. Two evolutionary fields synergistically came together in this decade to revolutionize the way we live, and how we can usefully change our destiny — biotechnology and information technology — bioinformatics.
— Written by Raymond Dessy, Emeritus Professor of Chemistry