Mikhail Karasikov

ML Engineer, PhD

ETH Zurich

About me

Currently, I’m a Scientific Collaborator at ETH Zurich developing AI solutions for hospitals and further scaling up MetaGraph, a large-scale search engine for DNA.

Previously, I was a Machine Learning researcher and engineer at Kaiko.AI, where I trained Vision Transformers (ViTs) and built foundation models (FMs) for computational pathology and other data modalities. This includes large-scale pretraining on diverse datasets, fine-tuning and optimizing the models for various downstream tasks, and making these models robust, interpretable, and useful in clinical practice.

I completed my PhD at ETH Zurich, where I designed novel algorithms and compressed data structures for indexing petabytes of biological sequences and developed methods scalable to the entire Sequence Read Archive. These methods finally made this trove of data accessible for search by sequence at scale and demonstrated the feasibility of making all of life's code easily searchable.

Before that, I studied Math, Physics, CS, and Machine Learning at MIPT, Skoltech, and YSDA and worked on a few problems of computational structural biology at Inria Grenoble-Rhône-Alpes.

Interests

Machine Learning
Bioinformatics
Computational Biology
Compressed Data Structures

Free time

Hiking/Camping, Skiing, Biking
Guitar, Piano

Education

Ph.D. in Computer Science, 2023

ETH Zurich, Zurich, Switzerland
M.Sc. in Math. and Computer Science, 2017

Skoltech, Moscow, Russia
M.Sc. in Applied Math. and Physics, 2017

MIPT, Moscow, Russia
PG Dip. in Computer Science, 2016

Yandex School of Data Analysis, Moscow, Russia
B.Sc. in Applied Math. and Physics, 2015

MIPT, Moscow, Russia

Featured Projects

Pathology Foundation Models

We enhance the SSL workflow and train pathology FMs at state-of-the-art level using up to 100-fold less data than current standards. We additionally incorporate high-resolution fine-tuning to further improve the FMs. With 12k TCGA WSIs, we trained a model that is on par with Virchow2, and our best model trained on 92k WSIs with high-resolution post-training showed an average top-1 performance.

MetaGraph

A C++ framework library for indexing very large collections of DNA/Protein sequences and a tool for sequence search, alignment, and assembly. Although the target use cases of MetaGraph overlap with BLAST, MetaGraph mainly focuses on the scalable indexing of raw sequencing data in annotated de Bruijn graphs with up to $\sim 10^{12}$ nodes and $\sim 10^{7}$ annotation labels. It also provides an online platform MetaGraph Online. Other contributors: Marc Zimmermann, Thomas Zhou, Oleksandr Kulkov, and the MetaGraph team.

Compressed Hybrid Bit Vector Representations

A C++ library with hybrid schemes for representing bit vectors in compressed space.

De Bruijn Graph Visualizer

A web app visualizing de Bruijn graphs and the BOSS table (Bowe et al.). Developed to interactively illustrate the core data structure used as a k-mer index for graph representation in MetaGraph.

Featured Publications

Mikhail Karasikov, Harun Mustafa, Daniel Danciu, Oleksandr Kulkov, Marc Zimmermann, Christopher Barber, Gunnar Rätsch, André Kahles (2025). Efficient and accurate search in petabase-scale sequence repositories. In Nature.

PDF Cite Project DOI Code

Mikhail Karasikov, Joost van Doorn, Nicolas Känzig, Melis Erdal Cesur, Hugo Mark Horlings, Robert Berke, Fei Tang, Sebastian Otálora (2025). Training state-of-the-art pathology foundation models with orders of magnitude less data. MICCAI 2025.

Cite Project DOI Preprint Code Model

Mikhail Karasikov (2023). Scalable Annotated Genome Graphs for Representing Sequence Data. In ETH Zurich Research Collection.

Cite DOI Thesis

Mikhail Karasikov, Harun Mustafa, Gunnar Rätsch, André Kahles (2021). Lossless Indexing with Counting de Bruijn Graphs. In RECOMB 2022.

Cite Project DOI Code

Mikhail Karasikov, Guillaume Pagès, Sergei Grudinin (2018). Smooth Orientation-Dependent Scoring Function for Coarse-Grained Protein Quality Assessment. In Bioinformatics.

PDF Cite Project DOI Code

Mikhail Karasikov, Harun Mustafa, Amir Joudaki, Sara Javadzadeh No, Gunnar Rätsch, André Kahles (2018). Sparse Binary Relation Representations for Genome Graph Annotation. In RECOMB 2019.

PDF Cite Project Slides DOI Code