Information processing by cells and biologists

    Roger Brent

    The core agenda of post-WWII molecular biology has been defined as the molecular understanding of how genetic information was transmitted and read out (see for example Stent 1968), and, by the 1950's, the analogy between the tape in a Turing machine and the linear sequence of nucleotides in DNA was apparent to both computer scientists and biologists.

    In the early 21st century, it may be that molecular biology can fruitfully return to return to these roots, by recasting part of its agenda in terms of the need to understand how biological information is processed. In a somewhat more modern formulation, cells can be thought of as machines that process and make decisions on three kinds of information: 1) information stored in the genome 2) information about intracellular events (for example from checkpoint mechanisms) and 3) information external to the cell.

    In many cases the machinery that cells use to make decisions is reasonably well understood at a qualitative level. However, in no case do we possess a corresponding quantitative understanding, and, reflecting this, nor are we very capable of predicting the outcomes of perturbations to the genome, the internal workings of the cell, or its external environment.

    One path to understanding the behavior of these ensembles of components clearly lies in construction of mechanism-based quantitative models representing cellular processes. Building such models requires solution of numerous computational and experimental biological challenges. I will detail some of these, and progr.

    Another path may involve computation on the qualitative biological knowledge that now exists. Expert biologists reason on this qualitative information to make statements about the consequences of perturbations, but expert systems that do the same in the main do not exist. Here, although the need is clear, the relative opacity (to me) of much of the seemingly relevant computer science literature has made it more difficult to figure out first steps.

    Finally, note that information theory (Shannon 1948) has it roots in the 20th century need to understand transmission of electrical signals through channels. It is not immediately clear that the representations of biological processes used by biologists map well to concepts that come from this theory. To give only one example, one is hard pressed to define or find, inside a cell that is processing signals from the outside, either the signal or the "bits" (Tukey, 1946) that might make it up. There may be thus be an opportunity here for new theory to guide thinking and further experiment.

    Brent, R. 2000. Genomic biology. Cell, 100, 169-183

    Endy, D. and Brent, R. 2001. Modeling cellular behavior. Nature (supplement), in press.

    Shannon, C. E. (1948) The mathematical theory of communication. Bell System Technical Journal.

    Stent, G. (1968) That was the molecular biology that was. Science 160, 390-394.

    Tukey, J. W. (1946) Referenced at www.maa.org.

    -> invited speakers

    -> Keynote abstracts

    -> RECOMB 2001