[PT] for proceedings talk, [HT] for highlights talk. Presenter's names are underlined.

RECOMB 2024 Accepted Papers

RECOMB 2024 Conference Proceedings

Day 1 (April 29, 2024)

07:30 - 08:00

Breakfast & Registration

08:00 - 08:05


08:05 - 09:35

Session Chair: Rohit Singh

[PT] Graph-matching-based learning of substitution matrices for biological structures with functional priors
Paolo Pellizzoni, Carlos Oliver and Karsten Borgwardt

[PT] DexDesign: A new OSPREY-based algorithm for designing de novo D-peptide inhibitors
Nathan Guerin, Henry Childs, Pei Zhou and Bruce Donald

[PT] PEFT-SP: Parameter-Efficient Fine-Tuning on Large Protein Language Models Improves Signal Peptide Prediction
Shuai Zeng, Duolin Wang and Dong Xu

[PT] Protein domain embeddings for fast and accurate similarity search
Benjamin Iovino, Haixu Tang and Yuzhen Ye

[PT] Contrastive Fitness Learning: Reprogramming Protein Language Models for Low-N Learning of Protein Fitness Landscape
Junming Zhao, Chao Zhang and Yunan Luo

09:35 - 09:50

Coffee Break

09:50 - 10:44

Session Chair: Xiuwei Zhang

[PT] DIISCO: A Bayesian framework for inferring dynamic intercellular interactions from sequential single-cell data
Cameron Park, Shouvik Mani, Nicolas Beltran, Katie Maurer, Satyen Gohil, Shuqiang Li, Teddy Huang, David Knowles, Catherine Wu and Elham Azizi

[PT] scMulan: a multitask generative pre-trained language model for single-cell analysis
Haiyang Bian, Yixin Chen, Xiaomin Dong, Chen Li, Minsheng Hao, Sijie Chen, Jinyi Hu, Maosong Sun, Lei Wei, and Xuegong Zhang

[PT] GFETM: Genome Foundation-based Embedded Topic Model for scATAC-seq Modeling
Yimin Fan, Yu Li, Jun Ding and Yue Li

10:44 - 11:38

Session Chair: Ruochi Zhang

[PT] regLM: Designing realistic regulatory DNA with autoregressive language models
Avantika Lal, Tommaso Biancalani and Gokcen Eraslan

[PT] SEM: sized-based expectation maximization for characterizing nucleosome positions and subtypes
Jianyu Yang, Kuangyu Yen and Shaun Mahony

[PT] Improving Hi-C contact matrices using genome graphs
Yihang Shen, Lingge Yu, Yutong Qiu, Tianyu Zhang and Carl Kingsford

11:38 - 13:00

Lunch Break

Collaborating on and Outsourcing Medically Driven Questions: How to Establish Trust in the process

Given the need for data, computational cost and technical expertise required to train machine learning models, researchers and clinicians may outsource the task of learning and privacy preserving collaborative analysis. We discuss adversarial settings where outsourcing sensitive computations encounters unforeseen problems throughout the Machine Learning pipeline, and discuss ways to mitigate them.

14:00 - 14:54

Session Chair: Gamze Gursoy

[PT] Secure Discovery of Genetic Relatives across Large-Scale and Distributed Genomic Datasets
Matthew Man-Hou Hong, David Froelicher, Ricky Magner, Victoria Popic, Bonnie Berger and Hyunghoon Cho

[PT] Secure federated Boolean count queries using fully-homomorphic cryptography
Alex Leighton and Yun William Yu

[PT] Privacy Preserving Epigenetic PaceMaker Stronger Privacy & Improved Efficiency
Meir Goldenberg, Loay Mualem, Amit Shahar, Sagi Snir and Adi Akavia

14:55 - 15:10

Coffee Break

15:10 - 15:46

Session Chair: Teresa Przytycka

[PT] VICTree - a Variational Inference method for Clonal Tree reconstruction
Harald Melin*, Vittorio Zampinetti*, Andrew McPherson and Jens Lagergen

[PT] Inferring allele-specific copy number aberrations and tumor phylogeography using spatially resolved transcriptomics
Cong Ma, Metin Balaban, Clara Liu, Siqi Chen, Li Ding and Ben Raphael

15:46 - 16:01

[HT] SPLASH: A statistical, reference-free genomic algorithm unifies biological discovery
Kaitlin Chaung, Tavor Baharav, George Henderson, Ivan Zheludev, Peter Wang and Julia Salzman

16:30 - 18:30

Poster Session I and Coffee Break

Day 2 (April 30, 2024)

07:30 - 08:00

Breakfast & Registration

08:00 - 08:05


08:05 - 09:35

Session Chair: Chirag Jain

[PT] Memory-bound k-mer selection for large evolutionary diverse reference libraries
Ali Osman Berk Şapcı and Siavash Mirarab

[PT] Accurate Assembly of Circular RNAs with TERRACE
Tasfia Zahin, Qian Shi, Xiaofei Carl Zang and Mingfu Shao

[PT] ImputeCC enhances integrative Hi-C-based metagenomic binning through constrained random-walk-based imputation
Yuxuan Du, Wenxuan Zuo and Fengzhu Sun

[PT] Centrifuger: lossless compression of microbial genomes for efficient and accurate metagenomic sequence classification
Li Song and Ben Langmead

[PT] GraSSRep: Graph-Based Self-Supervised Learning for Repeat Detection in Metagenomic Assembly
Ali Azizpour, Advait Balaji, Todd J. Treangen and Santiago Segarra

09:35 - 09:50

Coffee Break

09:50 - 10:50

The status of the human gene catalogue: where are we now, and how do we finish it?

How many genes do we have? The Human Genome Project was launched with the promise of revealing all of our genes, the “code” that would help explain our biology. The publication of the genome in 2001 provided only a very rough answer to the question of how many genes we have, and a highly-fragmented draft genome sequence. For more than two decades, the estimated number of protein-coding genes has steadily declined, but the invention of RNA sequencing revealed a vast new world of splice variants and RNA genes. In this talk, I will review where we’ve been and where we are today, and I will describe our recent effort to use large RNA sequencing resources to create a comprehensive human gene database. I will also discuss recent breakthroughs in protein structure prediction that form the basis of a new strategy to identify which proteins are functional.

10:50 - 11:44

Session Chair: Carl Kingsford

[PT] Meta-colored compacted de Bruijn graphs
Giulio Ermanno Pibiri, Jason Fan and Robert Patro

[PT] DAutomated design of efficient search schemes for lossless approximate pattern matching
Luca Renders, Lore Depuydt, Sven Rahmann and Jan Fostier

[PT] Haplotype-aware Sequence-to-Graph Alignment
Ghanshyam Chandra and Chirag Jain

11:45 - 13:00

Lunch Break

13:00 - 14:00

Deep Learning for Antibiotic Discovery

In this talk, we highlight the Antibiotics-AI Project, which is a multi-disciplinary, innovative research program that is leveraging MIT's strengths in artificial intelligence, bioengineering, and the life sciences to discover and design novel classes of antibiotics. The Antibiotics-AI Project is focused on developing, integrating and implementing deep learning models and chemogenomic screening approaches: (1) to predict novel antibiotics from expansive chemical libraries with diverse properties, (2) to design de novo novel antibiotics based on learned structural and functional properties of existing and newly discovered antibiotics, and (3) to identify, using explainable deep learning models, the chemical structures and molecular mechanisms underlying the newly discovered and/or designed antibiotics. With these deep learning approaches, we are utilizing multi-scale computation to embrace and harness the complexity of biology and chemistry, so as to discover, design and develop new classes of antibiotics, up through preclinical studies. Our platform has been designed so that it can be utilized and applied in a rapid fashion to emerging and re-emerging bacterial pathogens, including multidrug-resistant (MDR) bacteria and extensively drug-resistant (XDR) bacteria.

14:00 - 14:54

Session Chair: Mona Singh

[PT] Sequential Optimal Experimental Design of Perturbation Screens Guided by Multi-modal Priors
Kexin Huang, Romain Lopez, Jan-Christian Hütter, Takamasa Kudo, Antonio Rios and Aviv Regev

[PT] FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven Interpretation
Ali Khodabandeh Yalabadi, Mehdi Yazdani-Jahromi, Niloofar Yousefi, Aida Tayebi, Sina Abdidizaji and Ozlem Ozmen Garibay

[PT] Discovering and overcoming the bias in neoantigen identification by unified machine learning models
Ziting Zhang, Wenxu Wu, Lei Wei and Xiaowo Wang

14:55 - 15:10

Coffee Break

15:10 - 15:46

Session Chair: Sai Zhang

[PT] Community structure and temporal dynamics of viral epistatic networks allow for early detection of emerging variants with altered phenotypes
Fatemeh Mohebbi, Alex Zelikovsky, Serghei Mangul, Gerardo Chowell-Puente and Pavel Skums

[PT] A Scalable Adaptive Quadratic Kernel Method for Interpretable Epistasis Analysis in Complex Traits
Boyang Fu, Prateek Anand, Aakarsh Anand, Sriram Sankararaman and Joel Mefford

15:46 - 16:01

[HT] Association Plots: visualizing cluster-specific associations in high-dimensional correspondence analysis biplots
Martin Vingron and Elzbieta Gralinska

17:00 - 18:30

Business Meeting


Gala dinner, Museum of Science

Day 3 (May 1, 2024)

07:30 - 08:00

Breakfast & Registration

08:00 - 08:05


08:05 - 09:35

Session Chair: Vicky Yao

[PT] Fast Approximate IsoRank for Scalable Global Alignment of Biological Networks
Kapil Devkota, Anselm Blumer, Xiaozhe Hu and Lenore Cowen

[PT] An integer programming framework for identifying stable components in asynchronous Boolean networks
Shani Jacobson and Roded Sharan

[PT] Computing robust optimal factories in metabolic reaction networks
Spencer Krieger and John Kececioglu

[PT] BONOBO: Bayesian Optimized sample-specific Networks Obtained By Omics data
Enakshi Saha, Viola Fanfani, Panagiotis Mandros, Marouen Ben Guebila, Jonas Fischer, Katherine Shutta, Kimberly Glass, Dawn DeMeo, Camila Lopes Ramos and John Quackenbush

[PT] Inferring Metabolic States via Geometric Deep Learning
Holly Steach, Siddharth Viswanath, Yixuan He, Xitong Zhang, Natalia Ivanova, Michael Perlmutter and Smita Krishnaswamy

09:35 - 09:50

Coffee Break

09:50 - 10:50

Empower the Ecosystem for Biobank-Scale Whole Genome Sequencing Analysis

Whole Genome/Exome Sequencing (WGS/WES) data and Electronic Health Records (EHRs), such as large scale national and institutional biobanks, have emerged rapidly worldwide. In this talk, I will provide an overview of the methods and resources to empower the data science ecosystem of scalable analysis of large biobank- and population-based Whole Genome Sequencing (WGS) association studies of common and rare variants. I will discuss rare variant association tests and meta-analysis using individual level data and WGS summary statistics and incorporate whole genome variant functional annotations. I will discuss fitting mixed models using sparse GRM to account for population structure and relatedness at scale, incorporating multi-faceted variant functional annotations including context-specific annotations to empower WGS analysis, as well as recently developed ensemble tests, I will introduce FAVOR ( and FAVOR-GPT, a variant functional annotation online portal and resource that provides multi-faceted functional annotations of genome-wide 9 billion variants, and FAVORAnnotator, a tool to functionally annotate any WGS/WES studies. Cloud-based platforms for these resources will be discussed. Results of large scale population-based WGS studies and biobanks will be presented, including the Trans-Omics Precision Medicine Program (TOPMed) from the National Heart, Lung and Blood Institute, the Genome Sequencing Program (GSP) of the National Human Genome Research Institute, All of Us, and the UK Biobank. These studies have collectively sequenced about 1 million genomes.

10:50 - 12:02

Session Chair: Hoon Cho

[PT] PRS-Net: Interpretable polygenic risk scores via geometric learning
Han Li, Jianyang Zeng, Michael Snyder and Sai Zhang

[PT] MaSk-LMM: A Matrix Sketching Framework for Linear Mixed Models in Association Studies
Myson Burch, Aritra Bose, Gregory Dexter, Laxmi Parida and Petros Drineas

[PT] Disease Risk Predictions with Differentiable Mendelian Randomization
Ludwig Gräf, Daniel Sens, Liubov Shilova and Francesco Paolo Casale

[PT] Scalable summary statistics-based heritability estimation method with individual genotype level accuracy
Moonseong Jeong, Ali Pazokitoroudi and Sriram Sankararaman

12:05 - 13:30

Lunch Break

13:30 - 14:30

Controlling the release of large molecules from biomaterials: How overcoming skepticism led to new medical treatments and ways to tackle a global health challenge

Advanced drug delivery systems are having an enormous impact on human health. We start by discussing our early research on developing the first controlled release systems for macromolecules and the isolation of angiogenesis inhibitors and how these led to numerous new therapies. This early research then led to new drug delivery technologies including nanoparticles and nanotechnology that are now being studied for use treating cancer, other illnesses and in vaccine delivery (including the Covid-19 vaccine). Approaches for synthesizing new biomaterials, such as biodegradable polyanhydrides, are then examined, and examples where such materials are used in brain cancer and other diseases are discussed. Finally, by combining mammalian cells, including stem cells, with synthetic polymers, new approaches for engineering tissues are being developed that may someday help in various diseases. Examples in the areas of cartilage, skin, blood vessels and heart tissue are discussed.

14:30 - 15:42

Session Chair: Iman Hajirasouliha

[PT] CoRAL accurately resolves extrachromosomal DNA genome structures with long-read sequencing
Kaiyuan Zhu, Matthew Jones, Jens Luebeck, Xinxin Bu, Hyerim Yi, King L. Hung, Ivy Tsz-Lo Wong, Shu Zhang, Paul Mischel, Howard Chang and Vineet Bafna

[PT] Decoil: Reconstructing extrachromosomal DNA structural heterogeneity from long-read sequencing data
Madalina Giurgiu, Nadine Wittstruck, Elias Rodriguez-Fos, Rocio Chamorro-Gonzalez, Lotte Brueckner, Annabell Krienelke-Szymansky, Konstantin Helmsauer, Anne Hartebrodt, Richard P. Koche, Kerstin Haase, Knut Reinert and Anton G. Henssen

[PT] Determining Optimal Placement of Copy Number Aberration Impacted Single Nucleotide Variants in a Tumor Progression History
Chih Hao Wu, Suraj Joshi, Welles Robinson, Paul F. Robbins, Russell Schwartz, S. Cenk Sahinalp and Salem Malikic

[PT] Overcoming Observation Bias for Cancer Progression Modeling
Rudolf Schill, Maren Klever, Andreas Lösch, Y. Linda Hu, Stefan Vocht, Kevin Rupp, Lars Grasedyck, Rainer Spang and Niko Beerenwinkel

15:42 - 15:57

[HT] Cancer mutations converge on a collection of protein assemblies to predict resistance to replication stress
Xiaoyu Zhao, Akshat Singhal and Trey Ideker

16:30 - 18:30

Poster Session II and Coffee Break

Day 4 (May 2, 2024)

LOCATION: Schwarzman College of Computing (51 Vassar St), FLOOR 8

07:30 - 08:00

Breakfast & Registration

08:00 - 08:05


08:05 - 09:35

Session Chair: Maria Chikina

[PT] SpaCeNet: Spatial Cellular Networks from omics data
Stefan Schrod, Niklas Lück, Robert Lohmayer, Stefan Solbrig, Tina Wipfler, Katherine H. Shutta, Marouen Ben Guebila, Andreas Schäfer, Tim Beißbarth, Helena U. Zacharias, Peter J. Oefner, John Quackenbush and Michael Altenbuchinger

[PT] Mapping the topography of spatial gene expression with interpretable deep learning
Uthsav Chitra, Brian Arnold, Hirak Sarkar, Cong Ma, Sereno Lopez-Darwin, Kohei Sanno and Ben Raphael

[PT] Topological Velocity Inference from Spatial Transcriptomic Data Maps Cell Fate Transition in Space and Time
Yichen Gu, Jialin Liu, Chen Li and Joshua Welch

[PT] DeST-OT: Alignment of Spatiotemporal Transcriptomics Data
Peter Halmos, Xinhao Liu, Julian Gold, Feng Chen, Li Ding and Ben Raphael

[PT] CELL-E: A Text-To-Image Transformer for Protein Localization Prediction
Emaad Khwaja, Yun S. Song and Bo Huang

09:35 - 09:50

Coffee Break

09:50 - 10:26

Session Chair: Yunan Luo

[PT] Color Coding for the Fragment-Based Docking, Design and Equilibrium Statistics of Protein-Binding ssRNAs
Taher Yacoub, Roy González-Alemán, Fabrice Leclerc, Isaure Chauvot de Beauchene and Yann Ponty

[PT] Undesignable RNA Structure Identification via Rival Structure Construction and Structure Decomposition
Tianshuo Zhou, Wei Yu Tang, David Mathews and Liang Huang

10:26 - 11:32

Session Chair: William Yu

[PT] A Scalable Optimization Algorithm for Solving the Beltway and Turnpike Problems with Uncertain Measurements
Shane Elder, Quang Minh Hoang, Mohsen Ferdosi and Carl Kingsford

[PT] Efficient Analysis of Annotation Colocalization Accounting for Genomic Contexts
Askar Gafurov, Tomas Vinar, Paul Medvedev and Broňa Brejová

11:02 - 11:17

[HT] Genome-wide prediction of disease variant effects with a deep protein language model
Nadav Brandes, Grant Goldman, Charlotte H. Wang, Chun Jimmie Ye and Vasilis Ntranos

11:17 - 11:32

[HT] DeepMainmast: Integrated Protocol of Protein Structure Modeling for Cryo-EM with Deep Learning and Structure Prediction
Genki Terashi, Xiao Wang, Devashish Prasad, Tsukasa Nakamura and Daisuke Kihara

11:32 - 12:30

Lunch Break (lunch provided)

12:30 - 13:24

Session Chair: Can Alkan

[PT] Maximum Likelihood Inference of Time-scaled Cell Lineage Trees with Mixed-type Missing Data
Uyen Mai, Gillian Chu and Benjamin Raphael

[PT] TRIBAL: Tree Inference of B cell Clonal Lineages
Leah Weber, Derek Reiman, Mrinmoy Roddur, Yuanyuan Qi, Mohammed El-Kebir and Aly Khan

[PT] Optimal Tree Metric Matching Enables Phylogenomic Branch Length Reconciliation
Shayesteh Arasti, Puoya Tabaghi, Yasamin Tabatabaee and Siavash Mirarab

13:24 - 14:18

Session Chair: Anthony Gitter

[PT] Processing bias correction with DEBIAS-M improves cross-study generalization of microbiome-based prediction models
George Austin, Aya Brown Kav and Tal Korem

[PT] Semi-supervised learning while controlling the FDR with an application to tandem mass spectrometry analysis
Jack Freestone, Lukas Käll, William Stafford Noble and Uri Keich

[PT] Enhancing gene set analysis in embedding spaces: a novel best-match approach
Lechuan Li, Ruth Dannenfelser, Charlie Cruz and Vicky Yao

14:18 - 14:30

Closing session and awards

*Time Zone: EST