The Sequence of the Human Genome

    Mark Adams

    A consensus sequence of the euchromatic portion of human genome has been generated by the whole genome shotgun sequencing method that was developed while sequencing the genomes of Haemophilus influenzae and Drosophila melanogaster. The 2.9 billion bp sequence, was generated over nine months from 27,271,853 high quality sequence reads (~5X coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals: three females and two males of African-American, Asian-Chinese, Hispanic and Caucasian ethnicity. The coverage of the genome in cloned DNA represented by paired end-sequences exceeds 37X. Two assembly methods, a whole genome assembly and a regional hybrid assembly were utilized, combining BAC data from GenBank with Celera data. Over 90% of the genome is in scaffold assemblies of 500,000 bp or greater and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence reveals - 26,178 protein-encoding genes for which there is strong corroborating evidence and an additional 12,000 computationally derived genes with mouse homologues or other weak supporting evidence. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, tissue-specific developmental regulation, and in the hemostasis and immune systems. DNA sequence comparisons among the five individuals provided locations of 2.6 million single nucleotide polymorphisms (SNPs). The haploid genomes of a randomly drawn pair of humans differ at a rate of one per 1,250 bp on average but there is marked heterogeneity in the level of polymorphism across the genome. Only 0.75% of the SNPs led to possibly dysfunctional proteins.

    -> invited speakers

    -> Keynote abstracts

    -> RECOMB 2001