How can we embed information in molecules such that they form a specific shape? What kind of computation is possible given a set of molecules and interactions?
In order to answer such questions we will need to adapt concepts such as programming languages or compilers to physical substrates that are very different from the electronic devices we are familiar with.
Availability of such information creates many interesting questions, including the following: These problems involve many statistical challenges, for example, extracting complex and biologically meaningful relationships from high-dimensional, sparsely sampled data with noise.
Our goal is to address the challenges by developing effective machine learning approaches that can translate sophisticated biological processes into robust statistical models; can incorporate prior knowledge from multiple sources of genomic data; and can learn such models from data efficiently.
Computational Molecular Biology Overview Molecular Biology has become an information science with close ties to Computer Science.
Large databases and sophisticated algorithms have become essential tools for biologists seeking to understand complex biological systems, determine the functions of nucleotide and protein sequences, or reconstruct the course of evolution.Such tasks require molecular systems that operate autonomously in complex environments, sensing and responding to molecular events.To enable this "molecular programming revolution" we will have to develop the right computational models that allow us to describe and specify molecular behaviors.We have developed algorithms that search for statistically overrepresented motifs in such a collection of regulatory regions, these motifs being good candidate binding sites.An orthogonal approach deduces binding sites by considering orthologous regulatory regions of a single gene from multiple species.A natural application of this idea arises in the study of gene regulation.One of the challenges currently facing biologists is to understand the varied and complex mechanisms that regulate how, when, where, and at what rate genes express their products.This means that unusually well conserved sites among a set of orthologous regulatory regions are excellent candidate binding sites.Given orthologous input sequences and the evolutionary tree relating them, we have developed a practical phylogenetic footprinting algorithm that identifies the best conserved sites.The University of Washington is a major center for the acquisition and interpretation of genetic information.These tasks require the development of new and efficient algorithms.