Mikhail Karasikov

Mikhail Karasikov

PhD Student

ETH Zurich


I am a PhD student at ETH Zurich supervised by André Kahles and Gunnar Rätsch.

My work is focused on designing algorithms and compressed data structures for indexing very large collections of sequences and developing methods scalable to the entire sequence read archive. These methods build graph representations enabling analysis and queries, which would otherwise be practically impossible using only the raw data.

Prior to ETH Zurich, I studied Math, Physics, and Optimal Control at the Moscow Institute of Physics and Technology (MIPT). Then, I did a double Master’s program studying Mathematics and Machine Learning at MIPT and Skoltech. At the same time, I completed a two-year CS program at the Yandex School of Data Analysis and then interned at Inria Grenoble-Rhône-Alpes working on various problems of computational structural biology.

  • Machine Learning
  • Bioinformatics
  • Computational Biology
  • Compressed Data Structures
Free time
  • Ph.D. in Computer Science, present

    ETH Zurich, Zurich, Switzerland

  • M.Sc. in Math. and Computer Science, 2017

    Skolkovo Institute of Science and Technology (Skoltech), Moscow, Russia

  • M.Sc. in Applied Math. and Physics, 2017

    Moscow Institute of Physics and Technology (MIPT), Moscow, Russia

  • B.Sc. in Applied Math. and Physics, 2015

    Moscow Institute of Physics and Technology (MIPT), Moscow, Russia

Recent & Upcoming Talks

Topology-based Sparsification of Graph Annotations
Proceedings Presentation at ISMB/ECCB 2021 (HiTSeq COSI track).


A C++ framework library for indexing very large collections of DNA/Protein sequences and a tool for sequence search, alignment, and assembly. Although the target use cases of MetaGraph overlap with BLAST, MetaGraph mainly focuses on the scalable indexing of raw sequencing data in annotated de Bruijn graphs with up to $\sim 10^{12}$ nodes and $\sim 10^{7}$ annotation labels. It also provides an online platform MetaGraph Online.
Compressed Hybrid Bit Vector Representations
Hybrid schemes for representing bit vectors in compressed space.
A portal for DNA sequence search and geographical positioning based on the metagenomic MetaSUB data.
The initial prototype was set up on a weekend but it served well and was used as a base for the MetaGraph Search platform. Other contributors: Marc Zimmermann and Jiayu Chen.
De Bruijn Graph Visualizer
A small web app visualizing de Bruijn graphs and the BOSS table. It was written for educational purposes to interactively illustrate the core data structure used for graph representation in MetaGraph.
Protein Scoring
A method for single-model coarse-grained protein quality assessment developed during my internship at Inria Grenoble-Rhône-Alpes.
Sentiment Analysis
A small demo for sentiment analysis of reviews on Russian banks (one of those quick hands-on projects I did when studied at YSDA).
After a night of “bombarding” banki.ru with random requests for web scraping, our dorm network was banned by IP. Fortunately, 5k successfully scraped reviews were enough to complete the project, which I was reminded about each time my roommates or I had to use a VPN for accessing the website to check reviews on banks since then.
Activity Prediction
Classification of time-series data from smartphone accelerometer sensor. Implemented as a practical demonstration of the methods developed in my Bachelor’s thesis.

Featured Publications

(2021). Lossless Indexing with Counting de Bruijn Graphs. In RECOMB 2022.

Cite DOI Preprint Code

(2021). Topology-based Sparsification of Graph Annotations. In ISMB/ECCB 2021.

PDF Cite Slides DOI Code

(2020). MetaGraph: Indexing and Analysing Nucleotide Archives at Petabase-scale. In bioRxiv.

Cite Project DOI Preprint Code

(2018). Sparse Binary Relation Representations for Genome Graph Annotation. In RECOMB 2019.

PDF Cite Slides DOI Code


Courses TAed at ETH Zürich, Institute for Machine Learning: