Mikhail Karasikov

Mikhail Karasikov

ML Engineer, PhD

ETH Zurich

About me

Currently, I’m working as a Scientific Collaborator at ETH Zurich on further scaling up MetaGraph, a large-scale sequence search engine for DNA, and I’m open to new projects.

Previously, I was a Machine Learning researcher and engineer at Kaiko.AI, where I trained Vision Transformers (ViTs) and built foundation models (FMs) for computational pathology and other data modalities. This includes large-scale pretraining on diverse datasets, fine-tuning and optimizing the models for various downstream tasks, and making these models robust, interpretable, and useful in clinical practice.

I completed my PhD at ETH Zurich, where I designed novel algorithms and compressed data structures for indexing petabytes of biological sequences and developed methods scalable to the entire Sequence Read Archive. These methods finally made this trove of data accessible for search by sequence at scale and demonstrated the feasibility of making all of life's code easily searchable.

Before that, I studied Math, Physics, CS, and Machine Learning at MIPT, Skoltech, and YSDA and worked on a few problems of computational structural biology at Inria Grenoble-Rhône-Alpes.

Interests
  • Machine Learning
  • Bioinformatics
  • Computational Biology
  • Compressed Data Structures
Free time
Education
  • Ph.D. in Computer Science, 2023

    ETH Zurich, Zurich, Switzerland

  • M.Sc. in Math. and Computer Science, 2017

    Skoltech, Moscow, Russia

  • M.Sc. in Applied Math. and Physics, 2017

    MIPT, Moscow, Russia

  • PG Dip. in Computer Science, 2016

    Yandex School of Data Analysis, Moscow, Russia

  • B.Sc. in Applied Math. and Physics, 2015

    MIPT, Moscow, Russia

Featured Publications

(2025). Efficient and accurate search in petabase-scale sequence repositories. In Nature.

PDF Cite Project DOI Code

(2025). Training state-of-the-art pathology foundation models with orders of magnitude less data. MICCAI 2025.

Cite Project DOI Preprint Code Model

(2023). Scalable Annotated Genome Graphs for Representing Sequence Data. In ETH Zurich Research Collection.

Cite DOI Thesis

(2021). Lossless Indexing with Counting de Bruijn Graphs. In RECOMB 2022.

Cite Project DOI Code

(2018). Sparse Binary Relation Representations for Genome Graph Annotation. In RECOMB 2019.

PDF Cite Project Slides DOI Code

Teaching

Courses TAed at ETH Zürich, Institute for Machine Learning:

Contact