Projects

Ocean Microbiomics Database
Collaboration with Sunagawa Lab. Provided k-mer based sequence search and Counting de Bruijn graph indexes for the Genome Collection of the Ocean Microbiomics Database. Other contributors: Lucas Paoli, Harun Mustafa, Andre Kahles. (Published in Nature).
MetaGraph
A C++ framework library for indexing very large collections of DNA/Protein sequences and a tool for sequence search, alignment, and assembly. Although the target use cases of MetaGraph overlap with BLAST, MetaGraph mainly focuses on the scalable indexing of raw sequencing data in annotated de Bruijn graphs with up to $\sim 10^{12}$ nodes and $\sim 10^{7}$ annotation labels. It also provides an online platform MetaGraph Online. Other contributors: Marc Zimmermann, Thomas Zhou, the MetaGraph team.
GeoDNA
A portal for sequence search and geographical positioning based on the metagenomic MetaSUB data. The initial prototype was set up on a weekend but it served well and was also used as a base for the MetaGraph Search platform. Other contributors: Marc Zimmermann, Jiayu Chen, André Kahles, Thomas Zhou. (Published in Cell).
De Bruijn Graph Visualizer
A small web app visualizing de Bruijn graphs and the BOSS table (Bowe et al.). Written for educational purposes to interactively illustrate the core data structure used for graph representation in MetaGraph.
Sentiment Analysis
A small demo for sentiment analysis of reviews on Russian banks (one of those quick hands-on projects I did when studied at YSDA).
After a night of “bombarding” banki.ru with random web scraping requests, our dorm network got banned by IP. Fortunately, the 5k successfully scraped reviews were enough to complete the project. However, my roommates and I had been reminded about this every time we had to use a VPN to check reviews on banks since then.