| | |
8:00 - 9:00 | Registration | |
9:00 - 9:15 | Welcome | |
| | |
| RECOMB-CG keynote talk | |
9:15 - 10:15 | Genome evolution in the Anthropocene
show/hide abstract
The Anthropocene is broadly defined as the geologic epoch where Earth’s ecosystems have been deeply impacted by humans. This impact is especially notorious on the phenotypes of plants and animals that were domesticated in the last 10,000 years, so much so that it is sometimes difficult to identify their corresponding wild ancestors. Strong selection for traits of interest to humans has left deep marks in the genomes of these species, which has allowed the identification of mutations associated with particular phenotypes. Furthermore, DNA extracted from archaeological samples opens a window into the past and can reveal early farming and cultural preferences. I will present examples of the traces of human activity in genomes of both domesticated species and wild species of commercial interest, and suggest how this information can be used in conservation efforts.
| Rute da Fonseca |
| | |
| Session 1: Gene evolution | chair: Luay Nakleh |
10:15 - 10:40 | Classifying the Post-Duplication Fate of Paralogous Genes
show/hide abstract
Gene duplication is one of the main drivers of evolution. It is well-known that copies arising from duplication can undergo multiple evolutionary fates, but little is known on their relative frequency, and on how environmental conditions affect it. In this paper we provide a general framework to characterize the fate of duplicated genes and formally differentiate the different fates. To test our framework, we simulate the evolution of populations using aevol, an in silico experimental evolution platform. When classifying the resulting duplications, we observe several patterns that, in addition to confirming previous studies, exhibit new tendencies that may open up new avenues for a better understanding the role of duplications.
| Reza Kalhor, Guillaume Beslon, Manuel Lafond and Celine Scornavacca |
10:40 - 11:05 | Inferring clusters of orthologous and paralogous transcripts
show/hide abstract
The alternative processing of eukaryote genes allows the production of multiple distinct transcripts from a single gene, thereby contributing to transcriptome diversity. Recent studies suggest that more than 90% of human genes are concerned, and the transcripts resulting from alternative processing are highly conserved between orthologous genes.
In this paper, we present a model to define orthology and paralogy relationships at the transcriptome level, and an algorithm to infer clusters of orthologous and paralogous transcripts. Gene-level homology relationships are used to define different types of homology relationships between transcripts and a Reciprocal Best Hits approach is used as a basis to infer clusters of isoorthologous and recent paralogous transcripts.
The method was applied to transcripts of gene families from the Ensembl-Compara database. The results are consistent with the results of previous studies on the comparison of orthologous gene transcripts. The results also provide evidence that searching for conserved transcripts beyond orthologous genes is likely to be informative. The results obtained on the Ensembl-Compara gene families are available in the TranscriptDB database accessible at https://github.com/UdeS-CoBIUS/TranscriptOrthology.
| Ouedraogo Wend Yam Donald Davy and Aida Ouangraoua |
11:05 - 11:25 | Break 1 | |
| | |
| Session 2: Genome rearrangement I | chair: Aïda Ouangraoua |
11:25 - 11:50 | On the class of double distance problems
show/hide abstract
This work is about comparing two genomes S and D over the same set of gene families, such that S is singular (has one gene per family), while D is duplicated (has two genes per family). Considering some underlying model, that can be simply the minimization of breakpoints or finding the smallest sequence of mutations mimicked by the double-cut-and-join (DCJ) operation, the double distance of S and D aims to find the smallest distance between D and any element from the set 2S, that contains all possible genome configurations obtained by doubling the chromosomes of S. The breakpoint double distance of S and D is an easy problem that can be greedily solved in linear time. In contrast, the DCJ double distance of S and D was proven to be NP-hard. The complexity space between these two extremes can be explored with the help of an intermediate family of problems, the sigma_k distances, defined for each k ∈ {2, 4, 6, ..., ∞}, in a way such that the sigma_2 distance equals the breakpoint distance and the sigma_∞ distance equals the DCJ distance. With this class of problems it is possible to investigate the complexity of the double distance under the sigma_k distance, increasing the value k in an attempt to identify the smallest value for which the double distance becomes NP-hard, indicating the point in which the complexity changes. In our more recent work we have proven that, for the particular case in which genomes can only be composed of circular chromosomes, both sigma_4 and sigma_6 double distances can be solved in linear time. Here we present a non-trivial extension of these results to genomes including linear chromosomes.
| Marilia Braga, Leonie R. Brockmann, Katharina Klerx and Jens Stoye |
11:50 - 12:15 | The Floor is Lava - Halving Genomes with Viaducts, Piers and Pontoons
show/hide abstract
The Double Cut and Join (DCJ) model is a simple and powerful model for the analysis of large structural rearrangements. After being extended to the DCJ-indel model, capable of handling gains and losses of genetic material, research has shifted in recent years toward enabling it to handle natural genomes, for which no assumption about the distribution of markers has to be made.
Whole Genome Duplications (WGD) are events that double the content and structure of a genome. In some organisms, multiple WGD events have been observed while loss of genetic material is a typical occurrence following a WGD event. Natural genomes are therefore the ideal framework, under which to study this event.
The traditional theoretical framework for studying WGD events is the Genome Halving Problem (GHP). While the GHP is solved for the DCJ model for genomes without losses, there are currently no exact algorithms utilizing the DCJ-indel model, nor algorithms that are capable of handling natural genomes.
In this work, we address this issue and present a simple and general view on the DCJ-indel model that we apply to derive an exact polynomial time and space solution for the GHP on genomes with a bounded number of genes per family.
We then generalize this solution as an integer linear program (ILP) to an exact solution for the NP-hard GHP for natural genomes.
| Leonard Bohnenkämper |
| | |
12:15 - 1:45 | Lunch Break | |
| | |
| RECOMB-Seq keynote talk | |
1:45 - 2:45 | Chromosome-scale haplotype-resolved genomics: methods and applications
show/hide abstract
Reconstructing the complete phased sequences in human and non-human species is important in medical, biosustainability, and comparative genetics, for understanding the genetic basis of complex traits. The unprecedented advancements in sequencing technologies have opened up new avenues to reconstruct these phased sequences that would enable a deeper understanding of molecular, cellular, and developmental processes underlying complex diseases and bio-based chemical production. Despite these interesting sequencing innovations, the reference genomes of humans and microbes like fungi are unphased, and thus annotating novel expression and methylation results are incomplete and inaccurate, which affects the interpretation of molecular genetics and epigenetics of diseases and bio-based chemical production. There is a pressing need for streamlined, production-level, easy-to-use computational approaches that can reconstruct high-quality chromosome-scale phased sequences, and that can be applied to human genomes and microbes at scale.
In this talk, first, I will present an efficient combinatorial phasing model that leverages new long-range Strand-specific technology and long reads to generate chromosome-scale phasing. Second, I present an efficient algorithm to perform accurate haplotype-resolved assembly of human individuals. This method takes advantage of new long accurate data types (PacBio HiFi) and long-range Hi-C data. We for the first time can generate accurate chromosome-scale phased assemblies with base-level-accuracy of Q50 and continuity of 25Mb within 24 hours per sample, therefore, setting up a milestone in the genomic community. Third, I will present the generalized graph-based method for phased assembly of cancer genomics that produced the first precise somatic and germline structural variant landscape required for better drug therapeutics.
In summary, my works efficiently and robustly combine data from a variety of sequencing technologies to produce high-quality phased assemblies. These computational methods will enable high-quality precision medicine and facilitate new and unbiased studies of human (and non-human) haplotype variation which are currently goals of the Human Genome Reference Project, the European Reference genome atlas, and the Global Alliance of biofoundries.
| Shilpa Garg |
| | |
| Session 3: Genome rearrangement II | chair: Leonard Bohnenkämper |
2:45 - 3:10 | Two strikes against the phage recombination problem
show/hide abstract
The recombination problem is inspired by genome rearrangement events that occur in bacteriophage populations. Its goal is to explain how to transform a bacteriophage population into another using
the minimum number of recombinations. Here we show that the general
combinatorial problem is NP-Complete, both when the target population
contains only one genome of unbounded length. and when the size of the
genomes is bounded by a constant. In the first case, the existence of a
minimum solution is shown to be equivalent to a 3D-matching problem,
and in the second case, to a satisfiability problem. These results imply
that the comparison of bacteriophage populations using recombinations
will have to rely on heuristics that exploit biological constraints.
| Manuel Lafond, Anne Bergeron and Krister Swenson |
3:10 - 3:35 | Physical mapping of two nested fixed inversions in the X chromosome of the malaria mosquito Anopheles messeae
show/hide abstract
Chromosomal inversions play an important role in genome evolution, speciation and adaptation of organisms to diverse environments. Mapping and characterization of inversion breakpoints can be useful for describing mechanisms of rearrangements and identification of genes involved in diversification of species. Mosquito species of the Maculipennis Subgroup include dominant malaria vectors and nonvectors in Eurasia but breakpoint regions of inversions fixed between species have not been mapped to the genomes. Here, we use the physical genome mapping approach to identify breakpoint regions of X chromosome inversions fixed between Anopheles atroparvus and the most widely spread sibling species An. messeae. We mapped breakpoint regions of two nested fixed inversions (~13 Mb and ~10 Mb) using fluorescence in situ hybridization of 53 gene markers with polytene chromosomes of An. messeae. The DNA probes were designed based on gene sequences of the annotated An. atroparvus genome. The two inversions resulted in five syntenic blocks, of which only two syntenic blocks (encompassing at least 179 annotated genes in the An. atroparvus genome) changed their position and orientation in the genome. Analysis of the An. atroparvus genome revealed enrichment of DNA transposons in sequences homologous to three of four breakpoint regions suggesting the presence of “hot spots” for rearrangements in mosquito genomes. Our study demonstrated that the physical genome mapping approach can be successfully applied to identification of inversion breakpoint regions in insect species with polytene chromosomes.
| Evgeniya Soboleva, Kirill Kirilenko, Valentina Fedorova, Alina Kokhanenko, Gleb Artemov and Igor Sharakhov |
3:35 - 4:00 | Break 2 | |
| | |
| Session 4: Phylogeny I | chair: Erin Molloy |
4:00 - 4:25 | Gene order phylogeny via ancestral genome reconstruction under Dollo
show/hide abstract
We present a proof of principle for an new kind of stepwise algorithm for unrooted binary gene-order phylogenies. This method incorporates a simple look-ahead inspired by Dollo's law, while simultaneously reconstructing each ancestor (HTU). We first present a generic version of the algorithm illustrating a necessary consequence of Dollo characters. In a concrete application we use generalized oriented gene adjacencies and maximum weight matching (MWM) to reconstruct fragments of monoploid ancestral genomes as HTUs. This is applied to three flowering plant orders while estimating phylogenies for these orders in the process. We discuss how to improve on the extensive computing times that would be necessary for this method to handle larger trees.
| Qiaoji Xu and David Sankoff |
4:25 - 4:50 | Prior Density Learning in Variational Bayesian Phylogenetic Parameters Inference
show/hide abstract
The advances in variational inference are providing promising paths in Bayesian estimation problems. These advances make variational phylogenetic inference an alternative approach to Markov Chain Monte Carlo methods for approximating the phylogenetic posterior.
However, one of the main drawbacks of such approaches is the modelling of the prior through fixed distributions, which could bias the posterior approximation if they are distant from the current data distribution.
In this paper, we propose an approach and an implementation framework to relax the rigidity of the prior densities by learning their parameters using a gradient-based method and a neural network-based parameterization.
We applied this approach for branch lengths and evolutionary parameters estimation under several Markov chain substitution models.
The results of performed simulations show that the approach is powerful in estimating branch lengths and evolutionary model parameters. They also show that a flexible prior model could provide better results than a predefined prior model. Finally, the results highlight that using neural networks improves the initialization of the optimization of the prior density parameters.
| Amine M. Remita, Golrokh Kiani Vitae and Abdoulaye Baniré Diallo |
| | |
5:00 - 6:30 | Poster Session | |