Hunger for new technologies, metrics, and spatiotemporal models in functional genomics.

    George Church

    Functional genomics, as a field, is applying genomic self-improvement protocols (cost-effective, comprehensive, precise, accurate, and useful) to the kinetics of complex cellular systems. Radical surgery in functional biology aims to mimic the success of structural biology along all five of those axes. Technologies of recombinant DNA and automation have brought costs down exponentially (100-fold in ten years) in structural studies. That combined with definitions of completeness push the second axis (to better than 99.99%). Those two then conspire to reduce random errors of the third axis by the beautiful, brute force of repetition. To reduce systematic errors requires more finesse. Models allow integration of wildly different experimental methods (e.g. models based on the genetic code plus phylogeny provide quite independent checks of models based on DNA electrophoretic images). Model interchange specifications and metrics for model comparison mutually reinforce one another and provide one path along the fifth axis, that of utility, via killer-applications such as homology searches. This combination of modeling and searching provides serendipity and "functional hypothesis generation" in abundance. It instantly connects previously separately studied processes and organisms. Statistical assessment of agreement between experiment and calculation can lead to improvement of the types of model parameter as well as parameter values. What are the analogous metrics and models for functional genomics? How can we estimate possible lower limits to costs? How do we define completion and accuracy? Finally, how to we create and assess searches (not just on data but on models) and the utility of applications in general? How do these feed back to experimental design and feed forward to bioengineering?

    The functional genomics measures that are now thought to be prime for automation, miniaturization, and multiplexing include electrophoresis, molecular microarrays, mass-spectrometry, microscopy. Microscopy is well suited for non-destructive time series, measures concerning spatial effects and stochastic kinetics of systems of one or a few of any critical molecule. The other methods currently offer richer signatures for multiplex (measure many molecules from the same source atonce). Such extensive multiplexing can reduce errors due to misalignment of the (unmultiplexed) measures in space and/or time. These misalignments are dramatic, but by no means limited to unplanned (meta) comparisons between literature values. In the spirit of eliminating systematic errors, we see a major role for models as integrating as disparate a set of measures as possible. The dynamic and spatial biomodels of yore thought doomed by some by lack of data, will soon promote fresh study in the glaring light of overdetermination, i.e. more datapoints than adjustable parameters and feedback to the experiments justification for even more data for even more accuracy.

    We illustrate the above themes in the context of stress responses in wildtype and mutant human erythrocytes, E. coli and yeast time series. We assess measures of up to 19 metabolites, 400 proteins, and over 7000 RNAs. These measures touch most of the critical 34 metabolites in erythrocytes but only a tiny fraction of the over 1200 in E.coli. They so far quantitate fewer than 10% of the proteins per experiment (and even these often have unknown covalent structure). For the RNAs (assayed with a dense set of oligonucleotides) we see a rich, probably comprehensive set, including many unpredicted transcripts. So what are the next steps? Spatial effects seen for DNA-motifs at a few bp, hundreds, and thousands of bp (for three separate reasons) can be found by automatable methods. Time-series of molecular concentration data can be aligned by discrete and/or interpolative dynamic programming. Components of regulatory networks evident in time-series can be assessed by these independent models. The components of decay as well as steady-state levels have been modeled for a complete RNA sets. These time series benefit from the sharp specific transitions that can be achieved through conditional mutants and drugs (chemical biology in general). Overarching questions remain as to how we will systematize (automate) kinetic modeling and applications to a point analogous with strucural data modeling all the while connecting with issues of global quality of life?

    -> invited speakers

    -> Keynote abstracts

    -> RECOMB 2001