Program


Program Schedule


RECOMB 2025 Accepted Papers [PT] for proceedings talk, [HT] for highlights talk.


Day 1 (April 26, 2025)


08:45 - 09:05

Opening Ceremony

09:05 - 10:05

Darwinian Evolution as a Form of Learning



Living organisms function according to protein circuits. Darwin's theory of evolution suggests that these circuits have evolved through variation guided by natural selection. However, it is currently not understood how variation mechanisms can give rise, within realistic population sizes and realistic numbers of generations, to protein circuits of the complexity found in nature. We suggest that machine learning offers the framework for investigating this question of how complex circuits can come into being via a Darwinian process without a designer. We formulate evolution as a form of learning from examples. The targets of the learning process are the protein expression functions that come close to best behavior in the specific environment. The learning process is constrained so that the feedback from experience is Darwinian. We formulate a notion of evolvability that distinguishes function classes that are evolvable with polynomially bounded resources from those that are not. The dilemma is that if the function class that describes the expression levels of proteins in terms of each other, is too restrictive, then it will not support biology, while if it is too expressive then no evolution algorithm will exist to navigate it.

10:05 - 10:20

Coffee Break

10:20 - 11:14

[PT] An adversarial scheme for integrating multi-modal data on protein function
Rami Nasser, Leah Schaffer, Trey Ideker and Roded Sharan


[PT] Learning maximally spanning representations improves protein function annotation
Jiaqi Luo and Yunan Luo


[PT] DualGOFiller: A dual-channel graph neural network with contrastive learning for enhancing function prediction in partially annotated proteins
Shaojun Wang, Hancheng Liu, Weiqi Zhai and Shanfeng Zhu

11:15 - 11:50

[PT] Decoding the Functional Interactome of Non-Model Organisms with PHILHARMONIC
Samuel Sledzieski, Charlotte Versavel, Rohit Singh, Faith Ocitti, Kapil Devkota, Lokender Kumar, Polina Shpilker, Liza Roger, Jinkyu Yang, Nastassja Lewinski, Hollie Putnam, Bonnie Berger, Judith Klein-Seetharaman and Lenore Cowen


[PT] STEAMBOAT: Attention-based multiscale delineation of cellular interactions in tissues
Shaoheng Liang, Junjie Tang, Guanghan Wang and Jian Ma

11:50 - 12:05

[HT] Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data
Zhana Duren and Qiuyue Yuan

12:05 - 12:20

[HT] DD-PRiSM: a deep learning framework for decomposition and prediction of synergistic drug combinations
Iljung Jin, Songyeon Lee, Martin Schmuhalek and Hojung Nam

12:20 - 13:30

Lunch Break

13:30 - 14:30

How to deal with exogenous RNAs: From viral RNAs to mRNA therapeutics



RNAs of external origin, such as viral RNAs and therapeutic mRNAs, rely on cellular machinery for entry and translation while facing cellular barriers that restrict their functions. Thus, for developing effective antivirals and RNA therapeutics, it is important to understand how cells deal with RNAs. In this presentation, I will discuss two recent studies exploring the regulatory mechanisms of exogenous RNAs. In the first part, I will talk about our studies on viral RNA regulation. Using massively parallel reporter assays of viral genomic segments, we discovered hundreds of RNA elements that control RNA stability and translation. Investigation of their mechanisms provides new insights into the regulation of both viral and cellular RNAs. This research creates a valuable resource while highlighting the potential of viral genomes for biological discovery and therapeutic development. Second, I will present our recent work on mRNA vaccines. Through genome-wide screens of in vitro-transcribed (IVT) mRNAs encapsulated in lipid nanoparticles (LNPs), we identified key cellular factors that impact mRNA. By comparing mRNAs with and without N1-methylpseudouridine modification, we also uncovered the mechanism by which this modification enhances protein production from IVT mRNAs. Our study provides a comprehensive map of cellular pathways regulating exogenous mRNAs, offering insights for improving RNA therapeutics.

14:30 - 15:25

[PT] An Exact and Fast SAT Formulation for the DCJ Distance
Aaryan Mahesh Sarnaik, Ke Chen, Austin Diaz and Mingfu Shao


[PT] A k-mer-based maximum likelihood method for estimating distances of reads to genomes enables genome-wide phylogenetic placement
Ali Osman Berk Şapcı and Siavash Mirarab


[PT] Characterizing the solution space of migration histories of metastatic cancers with MACH2
Mrinmoy Saha Roddur, Vikram Ramavarapu, Abi Bunkum, Ariana Huebner, Roman Mineyev, Nicholas McGranahan, Simone Zaccaria and Mohammed El-Kebir

15:25 - 15:40

Coffee Break

15:40 - 16:16

[PT] Old dog, new tricks: Exact seeding strategy improves RNA design performances
Théo Boury, Leonhard Sidl, Ivo L. Hofacker, Yann Ponty and Hua-Ting Yao


[PT] Scalable and interpretable identification of minimal undesignable RNA structure motifs with rotational invariance
Tianshuo Zhou, Wei Yu Tang, Apoorv Malik, David H. Mathews and Liang Huang

16:16 - 16:31

[HT] Accurate RNA 3D structure prediction using a language model-based deep learning approach
Tao Shen, Zhihang Hu, Siqi Sun, Di Liu, Felix Wong, Jiuming Wang, Jiayang Chen, Yixuan Wang, Liang Hong, Jin Xiao, Liangzhen Zheng, Tejas Krishnamoorthi, Irwin King, Sheng Wang, Peng Yin, James J. Collins and Yu Li

16:31 - 17:43

[PT] Hierarchical spatio-temporal state-space modeling for fMRI analysis
Yuxiang Wei, Anees Abrol and Vince Calhoun


[PT] Sequence-based TCR-peptide representations using cross-epitope contrastive fine-tuning of protein language models
Chiho Im, Ryan Zhao, Scott Boyd and Anshul Kundaje


[PT] A phylogenetic approach to genomic language modeling
Carlos Albors, Jianan Canal Li, Gonzalo Benegas, Chengzhong Ye and Yun S. Song


[PT] Rag2Mol: Structure-based drug design based on Retrieval Augmented Generation
Peidong Zhang, Xingang Peng, Rong Han, Ting Chen and Jianzhu Ma

17:45 - 19:15

Poster Session I and Coffee Break


Day 2 (April 27, 2025)


09:00 - 09:05

Welcome

09:05 - 10:05

Microbiome analysis for human and planetary health

Based on computational methods and resources, often developed in our group, here (i) I first introduce a concept of federating such interoperable bioinformatics tools and resources for efficient and flexible large scale data analysis. Applied to environmental sequencing, that is metagenomics, which has become a major driver for uncovering microbial biodiversity and increasingly also for molecular functionality on our planet, it enables powerful microbiome analysis. (ii) I illustrate this by our work on the gut microbiome, arguable the best-studied microbial community, serving as a model for other habitats. Metagenome-wide association studies enable bioinformatics-driven modelling of the microbial communities, but modelling needs constant extension and refinement, e.g. inclusion of eukaryotes or the incorporation of “absolute” abundance. (iii) I further show how to apply the underlying analysis concepts to other habitats, like ocean and soil, to arrive at a basic understanding of microbial life on earth, e.g. of gene evolution at global scale or of fluxes of molecular functions across habitats. For validation, but also hypothesis generation, we complement public data with newly generated ones from a large continental-scale, international expedition that traversed European coastlines (TREC). To maximize the added value of the new data, they need to be integrated with historic ones of very different data types. For this we are developing and utilizing SPIRE, a searchable, planetary-scale microbiome resource, for a baseline understanding the global microbiome structure and function as well as for microbiome-informed bioindicator and bioremediation applications towards improving planetary health.

10:05 - 10:20

Coffee Break

10:20 - 11:50

[PT] devider: long-read reconstruction of many diverse haplotypes
Jim Shaw, Christina Boucher, Yun William Yu, Noelle Noyes and Heng Li


[PT] Hyper-k-mers: efficient streaming k-mers representation
Igor Martayan, Lucas Robidou, Yoshihiro Shibuya and Antoine Limasset


[PT] Improved pangenomic classification accuracy with chain statistics
Nathaniel K. Brown, Vikram Shivakumar and Ben Langmead


[PT] Integer programming framework for pangenome-based genome inference
Ghanshyam Chandra, Md Helal Hossen, Stephan Scholz, Alexander T Dilthey, Daniel Gibney and Chirag Jain


[PT] Prokrustean Graph: A substring index for rapid k-mer size analysis
Adam Park and David Koslicki

11:50 - 12:05

[HT] SigAlign: an alignment algorithm guided by explicit similarity criteria
Kunhyung Bahk and Joohon Sung

12:05 - 12:20

[HT] Fast, sensitive detetion of protein homologs using deep dense retrieval
Liang Hong, Yu Li, Zhihang Hu and Jiuming Wang

12:20 - 13:30

Lunch Break

13:30 - 14:30

Mapping the genetic and phenotypic complexity of disease with AI/ML models



Unraveling both the genetic and phenotypic complexity of human diseases is extremely challenging yet critical for understanding their biology, inheritance, trajectory, and clinical manifestations. Deciphering their genetic architecture is especially difficult for the 98% of the genome that is outside of exomes. To address this challenge we developed deep learning-based methods that predict the transcriptional and post-transcriptional effects of noncoding variants with single-nucleotide sensitivity. These models predict epigenetic, regulatory, transcriptional, and post-transcriptional effects of variants, including in specific contexts such as specific cell types or developmental stages, and provide mechanistic, pathogenic, and clinical interpretation of such variants. The challenge of understanding the genetic basis of complex diseases is exacerbated by the complicated interplay between gene regulation and downstream processes, including gene expression and ultimately phenotypes. We bridge the interactions between epigenetics and gene expression with regulatory circuit predictions that enrich our understanding of the mechanisms of disease. Phenotypically, core features of complex human conditions can vary substantially in severity and presentation, and can coincide with an extensive and unique spectra of associated phenotypes and co-occurring conditions for each individual. We addressed this challenge for autism by leveraging broad phenotypic data from a large cohort to identify robust, clinically-relevant classes of autism and their patterns of core, associated, and co-occurring traits, which we further validate and replicate in an independent cohort. We demonstrate that phenotypic and clinical outcomes correspond to genetic and molecular programs of common, de novo, and inherited variation, and further characterize distinct pathways disrupted by the sets of mutations in each class. Remarkably, we discover that class-specific differences in the developmental timing of impacted genes align with clinical outcome differences. This provides a general approach that dissects the phenotypic complexity of human conditions, unraveling genetic programs underlying their heterogeneity and suggesting specific biological dysregulation patterns and mechanistic hypotheses.

14:30 - 15:24

[PT] ScatTR: Estimating the size of long tandem repeat expansions from short-reads
Rashid Al-Abri and Gamze Gürsoy


[PT] OMKar: optical map based automated karyotyping of genomes to identify constitutional abnormalities
Siavash Raeisi Dehkordi, Zhaoyang Jia, Joey Estabrook, Jen Hauenstein, Neil Miller, Paul Dremsek, Alex Hastie, Andy Wing Chun Pang and Vineet Bafna


[PT] Accurate detection of tandem repeats from error-prone long reads with EquiRep
Zhezheng Song, Tasfia Zahin, Xiang Li and Mingfu Shao

15:25 - 15:40

Coffee Break

15:40 - 16:34

[PT] cfDecon: Accurate and Interpretable methylation-based cell type deconvolution for cell-free DNA
Yixuan Wang, Jiayi Li, Jingqi Li, Shen Yang, Yuhan Huang, Xinyuan Liu, Yimin Fan, Irwin King, Yumei Li and Yu Li


[PT] ALPINE: an interpretable approach for decoding phenotypes from multicondition sequencing data
Wei-Hao Lee, Lechuan Li, Ruth Dannenfelser and Vicky Yao


[PT] Learning multi-cellular representations of single-cell transcriptomics data enables characterization of patient-level disease states
Tianyu Liu, Edward De Brouwer, Tony Kuo, Nathaniel Diamant, Alsu Missarova, Hanchen Wang, Minsheng Hao, Hector Corrada Bravo, Gabriele Scalia, Aviv Regev and Graham Heimberg

16:34 - 16:49

[HT] Coordinated, multicellular patterns of transcriptional variation that stratify patient cohorts are revealed by tensor decomposition
Jonathan Mitchel and Peter Kharchenko

16:49 - 17:04

[HT] Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering
Kerr Ding, Michael Chin, Yunlong Zhao, Wei Huang, Binh Khanh Mai, Huanan Wang, Peng Liu, Yang Yang and Yunan Luo

17:05 - 17:41

[PT] Untying rates of gene gain and loss leads to a new phylogenetic approach
Yoav Dvir and Sagi Snir


[PT] The tree labeling polytope: a unified approach to ancestral reconstruction problems
Henri Schmidt and Benjamin Raphael

17:41 - 17:56

[HT] CASTER: Direct species tree inference from whole-genome alignments
Chao Zhang, Rasmus Nielsen and Siavash Mirarab

18:00 - 19:30

Poster Session II and Coffee Break


Day 3 (April 28, 2025)


09:00 - 09:05

Welcome

09:05 - 10:05

A controlled approach to computational biology via axiomatic derivations



Methods in computational biology have varied origins. Some have been derived as solutions to desired optimizations, while others have emerged as successful heuristics, whose theoretical underpinnings were understood only after widespread adoption. I will review several examples ranging from methods in phylogenetics to tools for studying gene regulation, and will argue for an axiomatic approach, whose benefits mimic those of conducting comprehensive controls in experimental biology.

10:05 - 10:20

Coffee Break

10:20 - 11:32

[PT] Antimicrobial drug recommendation from MALDI-TOF mass spectrometry with statistical guarantees using conformal prediction
Nina Corvelo Benz, Lucas Miranda, Dexiong Chen, Janko Sattler and Karsten Borgwardt


[PT] mcRigor: a statistical method to enhance the rigor of metacell partitioning in single-cell data analysis
Pan Liu and Jingyi Jessica Li


[PT] ML-MAGES: A machine learning framework for multivariate genetic association analyses with genes and effect size shrinkage
Xiran Liu, Lorin Crawford and Sohini Ramachandran


[PT] Learning Latent Trajectories in Developmental Time Series with Hidden-Markov Optimal Transport
Peter Halmos, Julian Gold, Xinhao Liu and Benjamin Raphael

11:32 - 12:20

[ST] Building the Future of Precision Health: KNIH's Bio Big Data Initiative
Hyun-Young Park (Director General, National Institute of Health of Korea)


[ST] Introducing KOBIC: Shaping the Bio-Data-Powered Society of Knowledge and Innovation
Haeyoung Jeong (Director, Korea Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology)


[ST] Quantum machine learning for precision medicine: Opportunities and challenges
Daniel Kyungdeock Park (Yonsei University)

12:20 - 13:20

Lunch Break

13:20 - 14:20

Systems Metabolic Engineering of Bacteria for Natural Products

Systems metabolic engineering is a comprehensive field that combines the principles of traditional metabolic engineering with the cutting-edge techniques of systems biology, synthetic biology, and evolutionary engineering to enhance the development of bacterial cell factories. In today's lecture, I will detail the various strategies used in systems metabolic engineering to develop these bacterial cell factories. Additionally, I will discuss the specific systems metabolic engineering strategies that have been utilized for the development of strains aimed at producing a range of products, with a particular focus on natural products. These strategies not only streamline the production processes but also improve the efficiency and yield of the desired products.

14:20 - 15:15

[PT] TarDis: Achieving robust and structured disentanglement of multiple covariates
Kemal Inecik, Aleyna Kara, Antony Rose, Muzlifah Haniffa and Fabian Theis


[PT] Optimal marker genes for c-separated cell types
Bartol Borozn, Luka Borozan, Domagoj Severdija, Domagoj Matijevic and Stefan Canzar


[PT] Synthetic control removes spurious discoveries from double dipping in single-cell and spatial transcriptomics data analyses
Dongyuan Song, Siqi Chen, Christy Lee, Kexin Li, Xinzhou Ge and Jingyi Jessica Li

15:15 - 15:25

Coffee Break

15:25 - 16:37

[PT] Orientation-aware graph neural networks for protein structure representation learning
Jiahan Li, Shitong Luo, Cengyue Deng, Chaoran Cheng, Jiaqi Guan, Leonidas Guibas, Jian Peng and Jianzhu Ma


[PT] Active learning for protein structure prediction
Zexin Xue, Michael Bailey, Abhinav Gupta, Alejandro Corrochano-Navarro, Sizhen Li, Ruijiang Li, Qiu Yu, Ziv Bar-Joseph, Sven Jager and Lorenzo Kogler-Anele


[PT] Rewiring protein sequence and structure generative models to enhance protein stability prediction
Ziang Li and Yunan Luo


[PT] Learning a CoNCISE language for small molecule binding and function
Mert Erden, Kapil Devkota, Lia Varghese, Lenore Cowen and Rohit Singh

16:37 - 17:30

Break (Transfer to Gala Dinner)

17:30 - 20:00

Business Meeting/Gala Dinner


Day 4 (April 29, 2025)


09:00 - 09:05

Welcome

09:05 - 10:05

A pangenome perspective of structural variation



The discovery and resolution of genetic variation is critical to understanding disease and evolution. I will present our most recent work to sequence and assemble telomere-to-telomere diverse human and nonhuman primate (NHP) genomes using both ultra-long and high-fidelity long-read sequencing technologies. The emerging data from hundreds of T2T human & ape genomes allows us, in principle, to reconstruct the evolutionary history of every base of our genome, revealing complex patterns of mutation and rapid diversification of genes and regions previously inaccessible by short-read sequencing. I will discuss algorithmic challenges to correctly representing such variation using graph-based approaches and potential solutions based on an understanding of the dynamic mutational processes shaping these regions at the species, population, and familial level. I will also present some recent tools that have been developed to leverage both graphs and pangenomes, as well as some approaches to integrate short-read sequencing data to improve future disease association in humans.

10:05 - 10:20

Coffee Break

10:20 - 11:50

[PT] TX-Phase: Secure phasing of private genomes in a trusted execution environment
Natnatee Dokmai, Kaiyuan Zhu, Cenk Sahinalp and Hyunghoon Cho


[PT] BayesRVAT: Bayesian aggregation of multiple annotations enhances rare variant association testing
Antonio Nappi, Na Cai and Francesco Paolo Casale


[PT] ralphi: a deep reinforcement learning framework for haplotype assembly
Enzo Battistella, Anant Maheshwari, Barış Ekim, Bonnie Berger and Victoria Popic


[PT] GEM-Finder: dissecting GWAS variants via long-range interacting cis-regulatory elements with differentiation-specific genes
Gyeongsik Park, Andrew J Lee, Sunwoo Min, Seyoung Jin and Inkyung Jung


[PT] Dynamic mu-PBWT: Dynamic run-length compressed PBWT for biobank scale data
Pramesh Shakya, Ahsan Sanaullah, Degui Zhi and Shaojie Zhang

11:50 - 12:05

[HT] Private information leakage from single-cell count matrices
Conor Walker, Xiaoting Li, Manav Chakravarthy, William Lounsbery-Scaife, Yoolim A. Choi, Ritambhara Singh and Gamze Gursoy

12:05 - 12:20

[HT] The simplicity of protein sequence-function relationships
Yeonwoo Park, Brian Metzger and Joseph Thornton

12:20 - 13:30

Lunch Break

13:30 - 14:24

[PT] Pharming: Joint clonal tree reconstruction of SNV and CNA evolution from single-cell DNA sequencing of tumors
Leah Weber, Anna Hart, Idoia Ochoa and Mohammed El-Kebir


[PT] ScisTree2: An improved method for large-scale inference of cell lineage trees and genotype calling from noisy single cell data
Haotian Zhang, Yiming Zhang, Teng Gao and Yufeng Wu


[PT] A partition function algorithm to evaluate inferred subclonal structures in single cell sequencing datasets
Farid Rashidi Mehrabadi, Erfan Sadeqi Azer, John Bridgers, Eva Pérez-Guijarro, Kerrie Marie, Howard Yang, Charli Gruen, Chih Hao Wu, Welles Robinson, Huaitian Liu, Can Kizilkale, Michael Kelly, Cari Smith, Sung Chin, Jessica Ebersole, Sandra Burkett, Aydin Buluc, Maxwell Lee, Erin Molloy, Teresa Przytycka, Glenn Merlino, Chi-Ping Day, Salem Malikic, Funda Ergun and S. Cenk Sahinalp

14:25 - 15:19

[PT] Tree reconstruction guarantees from CRISPR-Cas9 lineage tracing data using Neighbor-Joining
Sebastian Prillo, Kevin An, Wilson Wu, Ivan Kristanto, Matthew Jones, Yun Song and Nir Yosef


[PT] Inferring cell differentiation maps from lineage tracing data
Palash Sashittal, Richard Zhang, Benjamin Law, Alexander Strzalkowski, Henri Schmidt, Adriano Bolondi, Michelle Chan and Ben Raphael


[PT] Dynamic programming algorithms for fast and accurate cell lineage tree reconstruction from CRISPR-based lineage tracing data
Junyan Dai and Erin Molloy

15:20 - 15:40

Coffee Break

15:40 - 16:34

[PT] TissueMosaic enables cross-sample differential analysis of spatial transcriptomics datasets through self-supervised representation learning
Sandeep Kambhampati, Luca D'Alessio, Fedor Grab, Stephen Fleming, Fei Chen and Mehrtash Babadi


[PT] Joint imputation and deconvolution of gene expression across spatial transcriptomics platforms
Hongyu Zheng, Hirak Sarkar and Benjamin Raphael


[PT] Unified integration of spatial transcriptomics across platforms
Ellie Haber, Ajinkya Deshpande, Jian Ma and Spencer Krieger

16:35 - 17:29

[PT] Causal disentanglement of treatment effects in single-cell RNA Sequencing through counterfactual inference
Shaokun An, Jae-Won Cho, Kai Cao, Jiankang Xiong, Martin Hemberg and Lin Wan


[PT] GeneCover: A combinatorial approach for label-free marker gene selection
An Wang, Stephanie Hicks, Donald Geman and Laurent Younes


[PT] Integration and querying of multimodal single-cell data with product-of-experts VAE
Anastasia Litinetskaya, Maiia Shulman, Fabiola Curion, Artur Szalata, Alireza Omidi, Mohammad Lotfollahi and Fabian Theis

17:30 - 18:00

Awards and Closing

*Time Zone: KST