Volume 5, Number 4, pp. 121-128
The theoretical basis of universal identification systems for bacteria and viruses
S. Chumakov 1, C. Belapurkar 1, C. Putonti1, T.-B. Li 1, B.M. Pettitt 1,2,3, G.E. Fox 3,4, R.C. Willson 3,4 and Yu. Fofanov 1,2
1
Department of Computer Science, University of Houston, Houston, TX 77204, USA
2
Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA
3
Department of Chemistry, University of Houston, Houston, TX 77204, USA
4
Department of Chemical Engineering, University of Houston, Houston, TX 77204, USA
It is shownthat the presence/absence pattern of 1000 random oligomers of length12–13 in a bacterial genome is sufficiently characteristic to readily andunambiguously distinguish any known bacterial genome from any other. Evengenomes of extremely closely-related organisms, such as strains of the samespecies, can be thus distinguished. One evident way to implement this approachin a practical assay is with hybridization arrays. It is envisioned that asingle universal array can be readily designed that would allow identificationof any bacterium that appears in a database of known patterns. We performed insilico experiments to testthis idea. Calculations utilizing 105 publicly-available completely-sequencedmicrobial genomes allowed us to determine appropriate values of the testoligonucleotide length, n,and the number of probe sequences. Randomly chosen n
Keywords: